Hacker News new | ask | show | jobs
by krenoten 2950 days ago
It means pay more attention to reliability than pop infrastructure and internet companies (who can offset poor reliability with human attention or intentionally deprioritize it to sell more support contacts) tend to put into these things. Specifically, exhaustive concurrency testing of lock-free algorithm interleavings via ptrace driven scheduling, model-based testing in combination with fault injection, ALICE-style file correctness testing, and for the various distributed modules that sit on top, network simulation combined with lineage driven fault injection. This is all very much a work in progress, and I'd love to work with more people on it!