Linear-Programming-Based Load Balancer (LPLB) LPLB is a parallel load balancer that leverages linear programming to optimize expert parallel workload distribution for MoE (Mixture-of-Experts) models. It dynamically reorders experts based on workload statistics, constructs replicas considering static topology, and solves optimal token assignments for each batch to achieve dynamic load balancing. The reordering process is facilitated by EPLB, and real-time workload statistics can be provided by the user, collected via torch.distributed , or obtained through the internal communicators of a Deep-EP buffer. Its embedded LP solver implements single-SM Interior Point Method (IPM) and leverages NVIDIA's cuSolverDx and cuBLASDx libraries for efficient linear algebra operations. LPLB is currently in the early research stage, and performance improvements are still under evaluation. Installation Prerequisites: CUDA Toolkit >= 12.6.3 (with cuSolverDx dependencies). DeepEP is optional but strongly recommended for practical use. for practical use. EPLB is embedded. ./download-mathdx.sh # export NVSHMEM_DIR=... # Optional pip install --no-build-isolation . For testing, an editable installation is recommended: pip install --no-build-isolation --editable . pytest tests Interface and Example # Global successes counter avail_counter = torch . zeros ( 1 , dtype = torch . int64 , device = "cuda" ) # Define topology of redundant experts r2o = torch . tensor ( [ [ 3 , 0 , 1 , 2 , 7 , 4 , 5 , 6 ], [ 6 , 7 , 4 , 5 , 0 , 1 , 2 , 3 ], ] ). T . int (). cuda () planner = Planner ( r2o , n_logical_experts + n_redundants_per_rank * ep_size , n_logical_experts , group = ep_group , ) # Initialize from a DeepEP `buffer` (optional) # planner.init_from_deep_ep(buffer) N_SMS = 100 # Logical expert indices selected by the model indices = ... # Planner returns physical expert indices redirected_indices = planner . run ( indices , avail_counter , N_SMS ) How LPLB Works LPLB extends EPLB (Expert Parallelism...
First seen: 2025-11-25 18:26
Last seen: 2025-11-25 23:27