An optimizing compiler doesn't help much with long instruction dependencies

https://news.ycombinator.com/rss Hits: 7
Summary

We at Johnny’s Software Lab LLC are experts in performance. If performance is in any way concern in your software project, feel free to contact us.There was a rumor I read somewhere related to training AI models, something along the lines “whether we compile our code in debug mode or release mode, it doesn’t matter, because our models are huge, all of our code is memory bound”.I wanted to investigate if this is true for the cases that are interesting to me so I wrote a few small kernels to investigate. Here is the first:for (size_t i { 0ULL }; i < pointers.size(); i++) { sum += vector[pointers[i]]; }This is a very memory intensive kernel. The data from vector is read from random locations – depending on the size of vector we can experiment with data being read from L1, L2, L3 caches or memory.I compiled this loop with Gcc, optimization level -O0 (no optimizations) and -O3 (full optimizations). Then I calculated the instruction count ratio – instructions_count(O0) / instruction_count(O3) and runtime ratio – runtime(O0) / runtime(O3). In an imaginary perfect hardware, where the runtime is proportional to instruction count and doesn’t depend on memory at all, the graph could look like this: In the above graph, an O0 version might have 10 times more instructions than O3 version, and therefore it should be ten times slower than the O3 version, regardless of the vector size.Of course, real hardware is different. The same graph for the same loop executed on my brand new AMD Ryzen 9 PRO 8945HS w/ Radeon 780M Graphics looks like this: The O0 generates almost 10 times more instructions than O3 version, but when the dataset is big enough, this doesn’t matter a lot: O3 version is about 3 times faster than with O0 version. So, the claim is (at least) partially true. Of course, being three times faster is something one should nevertheless appreciate, especially since very memory intensive codes like above don’t appear to often (but they do appear, e.g. the above loop is very simi...

First seen: 2025-06-01 10:31

Last seen: 2025-06-01 16:31