Root cause analysis? You're doing it wrong

https://news.ycombinator.com/rss Hits: 7
Summary

Now that we know what an analysis can look like, let’s dive into how to perform it. An accident is any undesirable event. For the purposes of this article, it’s additionally an undesirable event that someone wishes to expend resources (time or money) on preventing in the future.5 Note that an accident does not have to be “accidental.” Sometimes you deliberately cause one accident to prevent another, much worse, one. A past team of mine once deliberately null-routed traffic to a system because the system had an unpredictable memory leak and for this system, preventing usage entirely was considered better than waiting for it to crash on its own at some inconvenient time. A system has a goal. We create systems in order to accomplish something. Systems have a purpose.6 If you have something that looks like a system but is lacking purpose, you might think of it as an organism or something else complicated-but-not-designed-for-a-task. In addition to a goal, a system comes with constraints. These define the boundaries of what the system is allowed to do in pursuit of its goal. A common constraint on software systems is that they’re not allowed to use unbounded resources, or they should continue to function in the absence of an internet connection. Constraints are important for accident analysis because virtually all accidents happen when a constraint (explicit or implicit) is violated or missing. When analysing system, we can sometimes point to a subsystem that enforces these constraints, and then we call this subsystem a controller. The controller sends control actions to the system and receives feedback from it, with the purpose of keeping the system within desired constraints. Control is to be interpreted very broadly: redundant components with automatic failover can be modeled as a control action, much like training programmes are a control action. At a low level, control might be technical. At higher levels, control is almost always a social function. Another importan...

First seen: 2025-10-13 19:25

Last seen: 2025-10-14 03:28