Hacker News new | ask | show | jobs
by Ozzie_osman 7 days ago

  We sharded over 20 TB that we know about.
This is probably a typo, right? 20TB isn't that big. I would imagine they've sharded a lot more than that
4 comments

If you think 20TB "isn't that big" I want to know what size of DBs you're working with 0_0
It's big but it's not so big it wouldn't fit on SSD on one particularly beefy server (two for redundancy). Sharding this would be more about the transaction rate. Actually, sharding would always be about the transaction rate.
It doesn’t even remotely need to fit on one SSD with logical volume management (or RAID).
I mean yes, for a single DB it's large, but if you're thinking about sharding you're probably in the tens of TBs, and if you're a company offering sharding you've prob sharded larger workloads.
It's really not that big for a postgres db in a lot of places, honestly.
For a vast majority of use cases 20TB is positively enormous.
RDS caps out at 64 TB unless you use Aurora, so 20 TB is totally manageable without sharding.
This product is for Postgres deployments that are so large they need to be sharded. For these use cases, I think 20TB is about normal.
Yes. But for most workloads it is not much for PostgreSQL. You often will not have to shard at all.
that article seems to suggest 20TB total over the dozen deployments in prod.
Sure, but 20TB in “the only database you need” is mere hours or minutes worth of data for many workflows.
If your working set is 20 TB, then it's pretty big. Each database has its own mix of hot/cold data, so it's impossible to compare without more information. A better measure might be IOPS. RDS has fairly low maximum IOPS unless you spend a lot more for provisioned IOPS or use Aurora.
You are correct. As a point of comparison: almost ten years ago at Segment we had a single Aurora PostgreSQL instance with ~50T of data, it was used to index potential identity data in a much larger corpus of files stored in S3.