| Skipping flushing the local disk seems rather silly to me: - A modern high end SSD commits faster than the one way time to anywhere much farther than a few miles away. (Do the math. A few tens of microseconds specified write latency is pretty common. NVDIMMs (a sadly dying technology) can do even better. The speed of light is only so fast. - Unfortunate local correlated failures happen. IMO it’s quite nice to be able to boot up your machine / rack / datacenters and have your data there. - Not everyone runs something on the scale of S3 or EBS. Those systems are awesome, but they are (a) exceedingly complex and (b) really very slow compared to SSDs. If I’m going to run an active/standby or active/active system with, say, two locations, I will flush to disk in both locations. |
It is. Coordinated failures shouldn't be a surprise these days. It's kind of sad to here that from an AWS engineer. Same data pattern fills the buffers and crashes multiple servers, while they were all "hoping" that others fsynced the data, but it turns out they all filled up and crashed. That's just one case there are others.