LPLB: An early research stage MoE load balancer based on linear programming

https://news.ycombinator.com/rss Hits: 6

Summary

Linear-Programming-Based Load Balancer (LPLB) LPLB is a parallel load balancer that leverages linear programming to optimize expert parallel workload distribution for MoE (Mixture-of-Experts) models. It dynamically reorders experts based on workload statistics, constructs replicas considering static topology, and solves optimal token assignments for each batch to achieve dynamic load balancing. The reordering process is facilitated by EPLB, and real-time workload statistics can be provided by the user, collected via torch.distributed , or obtained through the internal communicators of a Deep-EP buffer. Its embedded LP solver implements single-SM Interior Point Method (IPM) and leverages NVIDIA's cuSolverDx and cuBLASDx libraries for efficient linear algebra operations. LPLB is currently in the early research stage, and performance improvements are still under evaluation. Installation Prerequisites: CUDA Toolkit >= 12.6.3 (with cuSolverDx dependencies). DeepEP is optional but strongly recommended for practical use. for practical use. EPLB is embedded. ./download-mathdx.sh # export NVSHMEM_DIR=... # Optional pip install --no-build-isolation . For testing, an editable installation is recommended: pip install --no-build-isolation --editable . pytest tests Interface and Example # Global successes counter avail_counter = torch . zeros ( 1 , dtype = torch . int64 , device = "cuda" ) # Define topology of redundant experts r2o = torch . tensor ( [ [ 3 , 0 , 1 , 2 , 7 , 4 , 5 , 6 ], [ 6 , 7 , 4 , 5 , 0 , 1 , 2 , 3 ], ] ). T . int (). cuda () planner = Planner ( r2o , n_logical_experts + n_redundants_per_rank * ep_size , n_logical_experts , group = ep_group , ) # Initialize from a DeepEP `buffer` (optional) # planner.init_from_deep_ep(buffer) N_SMS = 100 # Logical expert indices selected by the model indices = ... # Planner returns physical expert indices redirected_indices = planner . run ( indices , avail_counter , N_SMS ) How LPLB Works LPLB extends EPLB (Expert Parallelism...

First seen: 2025-11-25 18:26

Last seen: 2025-11-25 23:27

Read Full Article More from this Source

LPLB: An early research stage MoE load balancer based on linear programming

Summary

Related News

SSE sucks for transporting LLM tokens

Hacking Google Chrome Source Code: Make Puppeteer work over Redis PubSub

Photographer Built a Medium-Format Rangefinder, and So Can You

Fast, Memory-Efficient Hash Table in Java: Borrowing the Best Ideas

Computer Animator and Amiga fanatic Dick Van Dyke turns 100