| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Ozzie_osman 7 days ago

  We sharded over 20 TB that we know about.

This is probably a typo, right? 20TB isn't that big. I would imagine they've sharded a lot more than that

4 comments

dujuku 7 days ago

If you think 20TB "isn't that big" I want to know what size of DBs you're working with 0_0

link

inigyou 7 days ago

It's big but it's not so big it wouldn't fit on SSD on one particularly beefy server (two for redundancy). Sharding this would be more about the transaction rate. Actually, sharding would always be about the transaction rate.

link

ComputerGuru 6 days ago

It doesn’t even remotely need to fit on one SSD with logical volume management (or RAID).

link

Ozzie_osman 7 days ago

I mean yes, for a single DB it's large, but if you're thinking about sharding you're probably in the tens of TBs, and if you're a company offering sharding you've prob sharded larger workloads.

link

ubercore 7 days ago

It's really not that big for a postgres db in a lot of places, honestly.

link

GiorgioG 7 days ago

For a vast majority of use cases 20TB is positively enormous.

link

mplanchard 7 days ago

RDS caps out at 64 TB unless you use Aurora, so 20 TB is totally manageable without sharding.

link

returningfory2 7 days ago

This product is for Postgres deployments that are so large they need to be sharded. For these use cases, I think 20TB is about normal.

link

jeltz 7 days ago

Yes. But for most workloads it is not much for PostgreSQL. You often will not have to shard at all.

link

tingletech 7 days ago

that article seems to suggest 20TB total over the dozen deployments in prod.

link

happyopossum 7 days ago

Sure, but 20TB in “the only database you need” is mere hours or minutes worth of data for many workflows.

link

singron 7 days ago

If your working set is 20 TB, then it's pretty big. Each database has its own mix of hot/cold data, so it's impossible to compare without more information. A better measure might be IOPS. RDS has fairly low maximum IOPS unless you spend a lot more for provisioned IOPS or use Aurora.

link

rbranson 7 days ago

You are correct. As a point of comparison: almost ten years ago at Segment we had a single Aurora PostgreSQL instance with ~50T of data, it was used to index potential identity data in a much larger corpus of files stored in S3.

link