A Higgs-Bugson in the Linux Kernel

https://news.ycombinator.com/rss Hits: 11
Summary

We recently ran across a strange higgs-bugson that manifested itself in a critical system that stores and distributes the firm’s trading activity data, called Gord. (A higgs-bugson is a bug that is reported in practice but difficult to reproduce, named for the Higgs boson, a particle which was theorized in the 1960s but only found in 2013.) In this post I’ll walk you through the process I took to debug it. I tried to write down relevant details as they came up, so see if you can guess what the bug is while reading along. Some useful background information about NFS with Kerberos The NFS (“Network File System”) protocol is designed to access a regular POSIX filesystem over the network. The default security story of NFSv3, which is what we’re using here, is roughly “no security” on an untrusted network: the server only checks whether or not the client is connected from a ”privileged” port number (i.e. less than 1024). If the client says it’s connecting on behalf of a particular user, the server just trusts the client. What could go wrong? The other security option for NFS is Kerberos. When used with NFS, Kerberos cryptographically verifies the identity of the user accessing the file. What’s the bug? Gord often does large file copies to ship data around. These copies would very rarely fail with -EACCES (Permission denied) despite the permissions being correctly set on the filesystem. Although retries were possible, it would be sad to lose progress copying these files. Also, strange errors in data storage are scary! It’s possible that spurious errors could indicate a larger issue. There was no obvious pattern in these copies failing. Even identical jobs running simultaneously didn’t necessarily fail together. We did have one clue: if we switched Kerberos off in the dev environment (because the error sounded auth related), the copies never failed. So, maybe something was wrong with the Kerberos credentials? How does the kernel get your Kerberos credentials? In a typical ...

First seen: 2025-07-03 02:59

Last seen: 2025-07-03 13:05