Basic Facts about GPUs

https://news.ycombinator.com/rss Hits: 14
Summary

Basic facts about GPUs last updated: 2025-06-18 I’ve been trying to get a better sense of how GPUs work. I’ve read a lot online, but the following posts were particularly helpful: Making Deep Learning Go Brrrr From First Principles What Shapes Do Matrix Multiplications Like? How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog This post collects various facts I learned from these resources. Acknowledgements: Thanks to Alex McKinney for comments on independent thread scheduling. Table of Contents Compute and memory hierarchy A GPU’s design creates an imbalance since it can compute much faster than it can access its main memory. An NVIDIA A100 GPU, for example, can perform 19.5 trillion 32-bit floating-point operations per second (TFLOPS), but its memory bandwidth is only about 1.5 terabytes per second (TB/s). In the time it takes to read a single 4-byte number, the GPU could have performed over 50 calculations. Below is a diagram of the compute and memory hierarchy for an NVIDIA A100 GPU. The numbers I quote for flops/s and TB/s are exclusive to A100s. +---------------------------------------------------------------------------------+ | Global Memory (VRAM) | | (~40 GB, ~1.5 TB/s on A100) | +----------------------------------------+----------------------------------------+ | (Slow off-chip bus) +----------------------------------------v----------------------------------------+ | Streaming Multiprocessor (SM) | | (1 of 108 SMs on an A100, each ~(19.5/108) TFLOPS) | | (2048 threads, 64 warps, 32 blocks) | | +-----------------------------------------------------------------------------+ | | | Shared Memory (SRAM) / L1 Cache | | | (~192 KB on-chip workbench, 19.5 TB/s) | | +-----------------------------------------------------------------------------+ | | | Register File (~256 KB, ? TB/s) | | +-----------------------------------------------------------------------------+ | | | | | | | //-- A "Block" of threads runs on one SM --// | | | | +---------...

First seen: 2025-06-24 14:12

Last seen: 2025-06-25 03:16