CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning 🥳 Introduction CUDA-L2 is a system that combines large language models (LLMs) and reinforcement learning (RL) to automatically optimize Half-precision General Matrix Multiply (HGEMM) CUDA kernels. CUDA-L2 systematically outperforms major matmul baselines to date, from the widely-used torch.matmul to state-of-the-art NVIDIA closed-source libraries (cuBLAS, cuBLASLt-heuristic, cuBLASLt-AutoTuning). Paper Speedup of CUDA-L2 over torch.matmul, cuBLAS, cuBLASLt-heuristic, and cuBLASLt-AutoTuning across 1000 (M,N,K) configurations on A100. Speedup of CUDA-L2 over torch.matmul, cuBLAS, cuBLASLt-heuristic, and cuBLASLt-AutoTuning across 1000 (M,N,K) configurations on A100. Speedup comparison results across 1000 (M,N,K) configurations on A100. Speedup comparison results across 1000 (M,N,K) configurations on A100. 🎉 What's New [Dec 2, 2025] Released A100 optimized HGEMM kernels across 1,000 configurations. 🗒️ To-Do List Release HGEMM with 32-bit accumulator (SM80_16x8x16_F16F16F16F32 and F32F16F16F32 officially) for A100. Current version only support 16-bit accumulator (SM80_16x8x16_F16F16F16F16). Release HGEMM with 32-bit accumulator (SM80_16x8x16_F16F16F16F32 and F32F16F16F32 officially) for A100. Current version only support 16-bit accumulator (SM80_16x8x16_F16F16F16F16). Support denser matrix configurations (more configurations). Support denser matrix configurations (more configurations). Extend to more GPUs (Ada Lovelace, Hopper, Blackwell). Extend to more GPUs (Ada Lovelace, Hopper, Blackwell). Easy deployment for open-source LLMs. FAQ Q: Do A100 kernels apply to other machines like RTX 3090 or H100? A: Ideally, kernels trained on A100 should only be used on A100 if you are targeting speedup. They might have speedup on other machines, but it's not guaranteed. We will progressively release kernels trained on different machines. Q: What if I need matrix dimensions (M, N, K) not...
First seen: 2025-12-04 22:14
Last seen: 2025-12-05 15:16