| Great writeup (including the human cost, e.g. loss / lack of sleep, which in my experience has a huge impact on complicated incident resolution). Here’s what jumped out at me: “The new account was created in our database with a null value in the URI field.” Almost every time I see a database-related postmortem — and I have seen a lot of them — NULL is lurking somewhere in the vicinity of the crime scene. Even if NULL sometimes turns out not to be the killer, it should always be brought in for questioning. My advice is: never rely on NULL as a sentinel value, and if possible, don’t allow it into the database at all. Whatever benefits you think you might gain, they will inevitably be offset by a hard-to-find bug, quite possibly years later, where some innocuous-seeming statement expects either NULL or NOT NULL and the results are unexpected (often due to drift in the semantics of the data model). Although this was a race condition, if the local accounts and the remote accounts were affirmatively distinguished by type, the order of operations may not have mattered (and the account merge code could have been narrowly scoped). |
Null is a perfectly valid value for data, and should be treated as such. A default value (e.g. -1 for a Boolean or an empty for a string) can make your system appear to work where NULL would introduce a runtime error, but that doesn't mean your system is performing as expected, it just makes it quieter.
I know it's tempting to brush NULL under the rug, but nothing is just as valid a state for data as something, and systems should be written generally to accommodate this.