|
|
|
|
|
by rdtsc
176 days ago
|
|
This is about not even trying durability before returning a result ("Commit-to-disk on a single system is [...] unnecessary") it's hoping that servers won't crash and restart together: some might fail but others will eventually commit. However that assumes a subset of random (uncoordinated) hardware failures, maybe a cosmic ray blasts the ssd controller. That's fine, but it fails to account for coordinated failure where, a particular workload leads to the same overflow scenario on all servers the same. They all acknowledge the writes to the client but then all crash and restart. |
|
Suppose you have each server commit the data "to disk" but it's really a RAID controller with a battery-backed write cache or enterprise SSD with a DRAM cache and an internal capacitor to flush the cache on power failure. If they're all the same model and you find a usage pattern that will crash the firmware before it does the write, you lose the data. It's little different than having the storage node do it. If the code has a bug and they all run the same code then they all run the same bug.