Hacker News new | ask | show | jobs
by Ozzie_osman 816 days ago
Curious why you needed to shard at 7TB? I can imagine for some workloads, especially if it's write-heavy, you might start hitting constraints around vacuuming and things like that? But 7TB should be manageable on a (somewhat large and beefy) single machine.
2 comments

You're right we could. In fact, it was a single server until about 2 TB. We considered a larger server and in fact at that point we could have just added a few more disks. But we still decided to shard.

First, the data size is growing and we didn't really know the growth rate in advance. Sharding gives you some flexibility in the infrastructure sizing. And yes, you don't want to wait until the last minute.

Second, it helps us to spread the disk I/O. Possible on a single machine if you're a little bit careful with disk types and sizes. But again, the overall load still grows.

Third, all the bulk operations take a long time on a single server. Each of the distributed servers takes about an hour to back up and 2-3 hours to restore. I'd feel uneasy if it was much longer.

Don’t wait until the last possible second to make a big strategic move—do it early on your own schedule. Especially when growing at a high rate.