| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jimwhitson 4948 days ago
	At IBM, we were very keen on what we called 'FFDC' - 'first- failure data capture'. This meant having enough layers of error-detection, ideally all the way down to the metal, so that failures could be detected cleanly and logged before (possibly) going down, allowing our devs to reproduce and fix customer bugs. Naturally it wasn't perfect, and it depending on lots of very tedious planning meetings, but on the stuff I worked with (storage devices mainly) it was remarkably effective. In my experience in more 'agile' firms - startups, web dev shops and so on - it would be very hard to make a scheme like this work well, because of all the grinding bureaucracy, fiddly spec-matching and endless manual testing required, as well as the importance of controlling - and deeply understanding - the whole stack. Nonetheless, for infrastructure projects like Redis, I can see value in having engineering effort put explicitly into making 'prettier crashes'.

1 comments

spec-matching is a specialty of good agile companies, but web-dev shops don't usually write their own db/server software.