Team: William Hu, Drew Wadsworth, Sean Siddens, Stanley Winata, Daniel Fu, Ryan Swann, Muhammad Osama, Christopher R茅, Simran Arora Links: Arxiv | Code AI is gated by hardware. We think that opening up AI鈥檚 compute landscape is one of the most important problems to be working on right now. Building towards this goal, we present HipKittens: SoTA AMD kernels and a collection of opinionated programming primitives to make AMD kernel dev easier! Named after AMD's CUDA equivalent, called HIP. Building towards multi-silicon AI systems While AI has largely used a single hardware vendor to get to its current stage, AMD GPU hardware now offers state-of-the-art peak compute and memory bandwidth. However, this performance is locked away from AI workflows due to the lack of mature AMD software. SpecNVIDIA B200 SXM5AMD MI355X OAMBF16 matrix / tensor2.2 PFLOPs2.5 PFLOPsMXFP8 matrix / tensor4.5 PFLOPs5.0 PFLOPsMXFP6 matrix / tensor4.5 PFLOPs10.1 PFLOPsMXFP4 matrix / tensor9.0 PFLOPs10.1 PFLOPsMemory capacity180 GB288 GBMemory bandwidth8.0 TB/s8.0 TB/s Table 1: Hardware overview. Peak memory and compute speeds for the latest generation GPU platforms. The AMD software ecosystem includes AITER, a high performance AI kernel library; PyTorch and a few compilers (Triton, Mojo, TileLang); and Composable Kernel (CK), AMD's C++ based programming model for writing kernels. However, despite gigawatt-scale AMD deployments, the software remains brittle. The existing software offerings fail to consistently achieve peak performance. CK kernels frequently underperform (see our evaluations below). AITER and PyTorch are volatile; for instance, AITER and PyTorch SDPA Llama GQA backwards kernels achieve just 30% and 24% of SoTA performance respectively on AMD MI355X GPUs. And the compilers currently significantly sacrifice performance and have not yet demonstrated reusable programming primitives for AMD. Further, we find that some critical aspects of hardware functionality around bank conflict avoidan...
First seen: 2025-11-14 21:53
Last seen: 2025-11-15 21:56