Byte Type: Supporting Raw Data Copies in the LLVM IR

https://news.ycombinator.com/rss Hits: 5
Summary

GSoC 2025 - Byte Type: Supporting Raw Data Copies in the LLVM IRThis summer I participated in GSoC under the LLVM Compiler Infrastructure. The goal of the project was to add a new byte type to the LLVM IR, capable of representing raw memory values. This new addition enables the native implementation of memory-related intrinsics in the IR, including memcpy, memmove and memcmp, fixes existing unsound transformations and enables new optimizations, all with a minimal performance impact.BackgroundOne of LLVM’s longstanding problems is the absence of a type capable of representing raw memory values. Currently, memory loads of raw bytes are performed through an appropriately sized integer type. However, integers are incapable of representing an arbitrary memory value. Firstly, they do not retain pointer provenance information, rendering them unable to fully specify the value of a pointer. Secondly, loading memory values containing poison bits through an integer type taints the loaded value, as integer values are either poison or have a fully-defined value, with no way to represent individual poison bits.Source languages such as C and C++ provide proper types to inspect and manipulate raw memory. These include char, signed char and unsigned char. C++17 introduced the std::byte type, which offers similar raw memory access capabilities, but does not support arithmetic operations. Currently, Clang lowers these types to the i8 integer type, which does not accurately model their raw memory access semantics, motivating miscompilations such as the one reported in bug report 37469.The absence of a similar type in the LLVM IR hinders the implementation of memory-related intrinsics such as memcpy, memmove and memcmp, and introduces additional friction when loading and converting memory values to other types, leading to implicit conversions that are hard to identify and reason about. The two core problems stemming from the absence of a proper type to access and manipulate raw memory, ...

First seen: 2025-09-09 05:50

Last seen: 2025-09-09 09:53