Hacker News new | ask | show | jobs
by MaulingMonkey 3521 days ago
> I'm not a fan of defensive programming as it can hide an obvious bug for a long time (I consider it a Good Thing that the program crashed otherwise we might have gone months, or even years, with noticing the actual bug).

I've had segfaults "hidden" for a long time because my artist coworkers weren't reporting crashes in their tools. They assumed a 5 minute fix was something really complicated. Non-defensive programming is no panacea here. Worse, non-defensive programming often meant crashes well after the initial problem anyways, when all sane context was lost.

My takeaway here is that I need to automatically collect crashes - and other failures - instead of relying on end users to report the problem. This is entirely compatible with defensive programming - right now I'm looking at sentry.io and it's competitors (and what I might consider rolling myself) to hook up as a reporting back end for yet another assertion library (since none of them bother with C++ bindings.) On a previous codebase, we had an assert-ish macro:

  ..._CHECKFAIL( precondition, description, onPreconditionFailed );
Which let code like this (to invent a very bad example) not fatally crash:

  ..._CHECKFAIL( texture, "Corrupt or missing texture - failed to load [" << texturePath << "]", return PlaceholderTexture() );
  return texture;
Instead of giving me a crash deep in my rendering pipeline minutes after loading with no context as to what texture might be missing. Make it annoying as a crash in your internal builds and it will be triaged as a crash. Or even more severely, possibly, if simply hitting the assert automatically opens a bug in your DB and assigns your leads/managers to triage it and CCs QA, whoever committed last, and everyone who reviewed last commit ;)

> Logging is an art.

You're right, and it's hard. However. It's very easy to do better than not logging at all.

And I think something similar applies to defensive programming. You want null to crash your program? Do so explicitly, maybe with an error message describing what assumption was violated, preferably in release too instead of adding a possible security vulnerability to your codebase: http://blog.llvm.org/2011/05/what-every-c-programmer-should-... . Basically, always enabled fatal asserts.

This might even be a bit easier than logging - it's hard to pack too much information into a fatal assert. After all, there's only going to be one of them per run.

2 comments

Please, please, don't roll your own. It seems like an easy problem at a glance, but its far from it. The more fragmentation in these communities the worse off we all are. Sentry's totally open source, and we have generous free tiers on the hosted platform. Happy to talk more about this in detail, but if there's things you dont feel are being solved let us know.
> Please, please, don't roll your own. It seems like an easy problem at a glance, but its far from it. The more fragmentation in these communities the worse off we all are.

I've rolled my own before, for enough of the pieces involved here, to confirm you're entirely correct. There's a reason I'm looking at your tech ;)

> Happy to talk more about this in detail, but if there's things you dont feel are being solved let us know.

No mature/official C or C++ SDK. Built in support for native Windows and Android callstacks would be great - I see you've already done some work for handling OS X symbols inside the Cocoa bindings at least. Plus hooks to let me integrate my own callstack collection for other platforms you haven't signed the NDAs for (e.g. consoles) and whatever scripting languages we've embedded.

All the edge cases. I want to receive events:

* When my event reports a bug in my connection loss handling logic (requiring resending it later when the connection is restored.)

* When my event reports I've run out of file handles (requiring preopening files or thoroughly testing the error handling.)

* When I run out of memory (requiring preallocating - and probably reserving some memory to free in case writing a file or socket tries to allocate...)

* When I've detected memory corruption.

* When I've detected a deadlock.

Some of these will be project specific - because it's such an impossibly broad topic that sentry's SDKs can't possibly handle them all.

No hard crash collection - this might be considered outside of sentry.io's scope, though? It's also hideously platform specific to the point where some of the tools will be covered by console NDAs again. Even on windows it's fiddly as heck - I've seen the entire pipeline of configuring registry keys to save .mdmp s, using scripts to use ngen to create symbols for the unique-per-machine mscorlib.ni.dll and company - so you can resolve crashdumps with mixed C++/C# callstacks - and then using cdb to resolve the same callstack in multiple ways... it's a mess. I could still use the JSON API to report crash summaries, though.

On a less negative note, I see breadcrumbs support landed in unstable for the C# SDK.

EDIT: And then there's all the fiddly nice-to-haves, ease-of-use shorcuts, local error reporting, etc. - some of which will also be project specific - but rest assured, the last thing I want to do is retread the same ground that sentry.io already covers. And where there are gaps, pull requests are one of the easier options...

At work, we regard exception collecting as essential for both development and production - if an application reaches internal QA, it's already reporting to an exception collector. This is separate to whatever logging is going on.

Sentry.io is one of the services that we use, but I don't have any connection beyond being a customer. I would echo the sentiment about not rolling your own, though: you want your exception collector to be a thoroughly battle-tested bit of code, and if it's reporting to a remote service, you want that to be as separate as possible from the application infrastructure, and extremely reliable.