| HN Mirror

Actually, if we were running into cases where we aren't logging a panic which is actually happening in production, then the first thing to note is that we need to improve our observability. The issue might or might not be recoverable, but it should be logged. If nothing else, it should show up as a service crash somewhere within those logs, which is also something that service owners monitor and get alerts on.

The advantage of NilAway is not just detecting nil panic crashes after the fact (as you note, we should always be able to detect those eventually, once they happen!), but detecting them early enough that they don't make it to users. If the tool had been online when that panic was first introduced, it would have been fixed before ever showing up in the logs (Presumably, at least! The tool is not currently blocking, and developers can mistake a real warning for a false positive, which also exist due to a number of reasons both fundamental and just related to features still being added)

But, on the big picture, this is the same general argument as: "Why do you want a statically typed language if a dynamically typed one will also inform you of the mismatch at runtime and crash?" "Well, because you want to know about the issue before it crashes."

Beyond not making it all the way to prod, there is also a big benefit of detecting issues early on the development lifecycle, simply in terms of the effort required to address them: 'while typing the code' beats 'while compiling and testing locally' beats 'at code review time' beats 'during the deployment flow or in staging' beats 'after the fact, from logs/alerts in production', which itself beats 'after the fact, from user complains after a major outage'. NilAway currently works on the code review stage for most internal users, but it is also fast enough to run during local builds (currently that requires all pre-existing warnings in the code to either be resolved or marked for suppression, though, which is why this mode is less common).