| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sgarland 233 days ago
	Yes, it’s the ephemerality that’s the biggest issue. Enterprise-grade SSDs are quite reliable, and typically have PLP so even in the event of a sudden power loss, any queued writes that the drive has accepted - and thus ack’d the fsync() - will be written. Presumably you’d be running some kind of redundancy, likely some flavor of RAID or zRAID (assuming purely local storage here, not a distributed system like Ceph, nor synchronous replication). But in the cloud, if the physical server backing your instance dies, or even if someone accidentally issues a shutdown command, you don’t get that same drive back when the new instance comes up. So a problem that is normally solved by basic local redundancy suddenly becomes impossible, and thus you must either run synchronous or semi-sync replication (the latter is what PlanetScale Metal does), accepting the latency hit from distributed storage, or asynchronous replication and accept some amount of data loss, which is rarely acceptable.

2 comments

samlambert 233 days ago

Agreed on these trade offs. We do both synchronous and semi-synchronous depending on Postgres or MySQL.

link

pas 233 days ago

... sounds like a trivial job for bare metal instances

and that EC2 local NVMe encryption keys are ephemeral is nice against leaks, but not a necessity for other clouds (and not great for resumability, which can really downgrade business continuity scores), and I expect for all the money they ask for it, to be able to keep it relatively secure even across reboots

link

BonoboIO 233 days ago

Or even a bare metal simple server that just does databases with redundant nvme ssd

link