|
|
|
|
|
by unilynx
1824 days ago
|
|
I read it as "DNS software changed, that worked fine, but it turns out we sometimes generate a broken database - not often enough to see it hit during canary, but devastating when it finally happened" GP also notes that this database changed perhaps every 30 seconds Just a few guesses.. if you have a process that corrupts a random byte every 100.000 runs, and you run it every 30 seconds, it might take days before you're at 50% odds of having seen it happening. and if that used to be a text or JSON database, flipping a random bit might not even corrupt anything important. Or if the code swallows the exception at some level, it might even self-heal after 30 seconds when new data comes in, causing an unnoticed blib in the monitoring if at all Now I don't know what binary pack does exactly, but if you were to replace the above process with something that compresses data, a flipped bit will corrupt a lot more data, often everything from that point forwards (where text or json is pretty self-syncronizing). And if your new code falls over completely if that happens, no more self-healing. I can totally imagine missing an event like that during canary testing |
|