Every second, 84 million HTTP requests are hitting Cloudflare across our fleet of data centers in 330 cities. It means that even the rarest of bugs can show up frequently. In fact, it was our scale that recently led us to discover a bug in Go's arm64 compiler which causes a race condition in the generated code.This post breaks down how we first encountered the bug, investigated it, and ultimately drove to the root cause. Investigating a strange panic We run a service in our network which configures the kernel to handle traffic for some products like Magic Transit and Magic WAN. Our monitoring watches this closely, and it started to observe very sporadic panics on arm64 machines.We first saw one with a fatal error stating that traceback did not unwind completely. That error suggests that invariants were violated when traversing the stack, likely because of stack corruption. After a brief investigation we decided that it was probably rare stack memory corruption. This was a largely idle control plane service where unplanned restarts have negligible impact, and so we felt that following up was not a priority unless it kept happening.And then it kept happening. When we first saw this bug we saw that the fatal errors correlated with recovered panics. These were caused by some old code which used panic/recover as error handling. At this point, our theory was: All of the fatal panics happen within stack unwinding.We correlated an increased volume of recovered panics with these fatal panics.Recovering a panic unwinds goroutine stacks to call deferred functions.A related Go issue (#73259) reported an arm64 stack unwinding crash.Let’s stop using panic/recover for error handling and wait out the upstream fix?So we did that and watched as fatal panics stopped occurring as the release rolled out. Fatal panics gone, our theoretical mitigation seemed to work, and this was no longer our problem. We subscribed to the upstream issue so we could update when it was resolved and put it ...
First seen: 2025-10-08 15:14
Last seen: 2025-10-09 16:20