The rapid growth of data-intensive and computationally demanding workloads—ranging from large-scale machine learning and high-performance computing (HPC) simulations [10, 13, 19] to graphics rendering [12] and large language models [6]—has increasingly relied on GPU acceleration to achieve desired performance and throughput. As these GPU-powered systems scale, precise observability and low-overhead instrumentation become essential for diagnosing performance bottlenecks, optimizing resource usage, and identifying anomalous behavior at runtime. Existing GPU instrumentation approaches, such as NVIDIA's CUDA Profiling Tools Interface (CUPTI) [11] and binary-instrumentation frameworks like NVBit [14], offer insights into GPU performance but often incur substantial overhead or require invasive modifications that interrupt GPU kernels. Meanwhile, Linux's extended Berkeley Packet Filter (eBPF) has emerged as a powerful observability framework, traditionally employed in network analytics [5] and system profiling [16, 18]. eBPF allows lightweight, sandboxed instrumentation programs to execute in kernel space without intrusive kernel changes or performance penalties. However, existing eBPF techniques remain CPU-centric, constrained by frequent kernel-user space context switches and limited visibility into the GPU execution model. Applying eBPF to GPU workloads presents unique challenges: GPU execution models feature asynchronous kernel launches, parallel thread execution, and specialized memory hierarchies, making instrumentation significantly more complex and prone to high overhead. Furthermore, eBPF's inherent CPU-centric design does not directly extend to GPU hardware without novel translation and runtime adaptation strategies. To bridge this gap, we introduce eGPU, the first observability framework and eBPF runtime that dynamically extends eBPF instrumentation into GPU kernels via runtime PTX injection. Specifically, eGPU compiles eBPF bytecode into NVIDIA's Parallel Threa...
First seen: 2025-04-10 13:44
Last seen: 2025-04-10 13:44