| There's a ton of jargon here. Summarized... Why EBS didn't work: - EBS costs for allocation
- EBS is slow at restores from snapshot (faster to spin up a database from a Postgres backup stored in S3 than from an EBS snapshot in S3)
- EBS only lets you attach 24 volumes per instance
- EBS only lets you resize once every 6–24 hours, you can't shrink or adjust continuously
- Detaching and reattaching EBS volumes can take 10s for healthy volumes to 20m for failed ones, so failover takes longer
Why all this matters: - their AI agents are all ephemeral snapshots; they constantly destroy and rebuild EBS volumes
What didn't work: - local NVMe/bare metal: need 2-3x nodes for durability, too expensive; snapshot restores are too slow
- custom page-server psql storage architecture: too complex/expensive to maintain
Their solution: - block COWs
- volume changes (new/snapshot/delete) are a metadata change
- storage space is logical (effectively infinite) not bound to disk primitives
- multi-tenant by default
- versioned, replicated k/v transactions, horizontally scalable
- independent service layer abstracts blocks into volumes, is the security/tenant boundary, enforces limits
- user-space block device, pins i/o queues to cpus, supports zero-copy, resizing; depends on Linux primitives for performance limits
Performance stats (single volume): - (latency/IOPS benchmarks: 4 KB blocks; throughput benchmarks: 512 KB blocks)
- read: 110,000 IOPS and 1.375 GB/s (bottlenecked by network bandwidth
- write: 40,000–67,000 IOPS and 500–700 MB/s, synchronousy replicated
- single-block read latency ~1 ms, write latency ~5 ms
|
EBS volume attachment is typically ~11s for GP2/GP3 and ~20-25s for other types.
1ms read / 5ms write latencies seem high for 4k blocks. IO1/IO2 is typically ~0.5ms RW, and GP2/GP3 ~0.6ms read and ~0.94ms write.
References: https://cloudlooking.glass/matrix/#aws.ebs.us-east-1--cp--at... https://cloudlooking.glass/matrix/#aws.ebs.*--dp--rand-*&aws...