Do they make any promises about persistence of local NVMe after something like a full-region power outage yet?
Because if you can't do durable commit on a single-region cluster that will be just temporarily unavailable without loosing committed data if something like that happened, it's not quite there unless you still stream a WAL to storage that they do promise you will survive a full blackout of all zones that store (part of) the data.
Idk how an AWS region would respond to a power outage, but i have tested this in AWS Outpost, and there, if you power down a rack, then power it back again, the baremetal instances will not be recreated. (I was surprised as I was expecting the EC2 health check to terminate them, but it does not work like that.)
My understanding is that if you stop/start an instance, your local storage is gone (as the instance might even end up in a different host), but if you just reboot the instance, it should keep the local storage.
That's a good point. I re-ran the benchmark on two instances:
- c8gd.4xlarge - this has a single 950 GB NVMe SSD.
- c5ad.4xlarge - this has 2 x 300 GB disks, which I put in a RAID 0 array. There are no c6ad.4xlarge instances, so this is the closes NVMe-enabled approximate to ClickBench's most popular choice, c6a.4xlarge.
I also added results from my local dev machine, a MacBook M1 Max with 64 GB RAM and 10 cores.
On the cold run, the MacBook is on par with the c5ad.4xlarge. The c8gd.4xlarge is about ~2.5x faster on the cold run.
I know this is moving the goalpost, however, it's quite interesting that both of these cloud instances with instance-attached storage are still outperformed by the M1 Max (which is 4+ years old) on the cold run. And they would quite likely lose against the latest MacBook Pro with the M5 Pro/Max on both the cold and the hot runs. But that's an experiment for another day.