Hacker News new | ask | show | jobs
by z1mm32m4n 3631 days ago
With a graph that's always 0, I'd be scared that my verification code was accidentally wrong. Do you ever intentionally inject missing hashes somewhere to see if your graph blips? How would you feign that a hash was missing?
1 comments

Yup we do this, and yes, the verifier has been broken before (during early development, not at production scale).

We do a number of things here, like taking down a sufficient number of storage nodes in a single region to make blocks appear "missing" in the region and force an automatic failover to the other region (this is transparent to users, apart from slightly more latency), or more direct/risky checks in our Staging cluster (we don't ever mess with data in our main production cluster).

In reality a large system like this regularly encounters timeouts or failures of sub-components which are masked by our multi-region redundancy but show up as spikes in the verifiers. These remind us that everything is working, in between more explicit DRT (Disaster Recovery Training) tests.

ABF, Always Be Failing (just a little bit).