How to tile matrix multiplication (2023)

https://news.ycombinator.com/rss Hits: 13
Summary

from Guide to Machine Learning on Apr 30, 2023 How to tile matrix multiplication Matrix multiplication is a staple of deep learning and a well-studied, well-optimized operation. One of the most common optimizations for matrix multiplication is called "tiling," but as common and important as it is, it's a bit confusing to understand. Tiling matrix multiplication is a valuable technique that optimizes resource utilization in multiple dimensions, including power, memory, and compute. Critically, tiling then reduces overall latency, making this vital for models heavily reliant on dense matrix multiplication. One such example is transformers and their associated Large Language Models; their heavy reliance on dense matrix multiplies for inference makes tiling an important concept to understand — and to leverage. Not sure why dense matrix multiplies are so necessary? For a primer on how Large Language Models work, check out the 3-part series, beginning with Language Intuition for Transformers. In this post, we'll break down how tiling for matrix multiplication works, again by conveying intuition primarily through illustrations. I'll start with a description of how to tile a single matrix multiply. Here we only cover the most salient parts at a high level. Let's multiply two matrices $A$ and $B$ normally. To do so, we take the inner product of all the rows in $A$ and the columns in $B$. We illustrate this below. Here's what that process looks like in more detail: Fetch the first row $A_{0,:}$ (8 fetches). Fetch the first column $B_{:,0}$ (8 fetches). Take the inner product to get one value in our output $O_{0,0}$. Repeat this for all 64 output values. For each of the 64 values in our output, we need to fetch a total of 16 values: 8 values from $A$ and 8 values from $B$. This means we need $64 \times 16 = 1024$ total fetches. For our first step, we can simply reuse the first row of $A$. This is pictured below, where we fetch one row of $A$ to compute the entire first row of ...

First seen: 2025-10-06 22:07

Last seen: 2025-10-07 10:09