Hacker News new | ask | show | jobs
by kakwa_ 3236 days ago
Storing 60+TB of data is different than searching and doing complex computation on 60TB of data.

Also, operations on a such huge data set can be really painful. Think how to backup a DB like that safely, or how to update the engine.

Some slides (little old, 2014) about a huge postgres instance serving as a backend for leboncoin.fr (main classified advertising website in France).

https://fr.slideshare.net/jlb666/pgday-fr-2014-presentation-...

Basically, they bought the best hardware money could buy at the time to scale vertically, they, in the end, run in some issues and started thinking about sharding this huge DB.

1 comments

Absolutely. Queries on 60TB of data can certainly merit more than one box. Hell, queries on 1TB of data can merit more than one box.

I have a workload that runs close to 1.2million TPS for hours at a time and needs less than 100 millisecond response times at the 99th percentile. That uses more than 1 box and sits (replicated) in RAM.

However, 5TB of data really _isn't_ that much on modern SSD's. You can fit a sizable chunk of that in RAM on a decent server, so you probably _don't_ need more than one box.

I have 5TB of data that needs to sit on an SSD is, to be honest, a really poor performance metric. If you are genuinely specing out hardware and a database a better statement would be:

"I have 5TB of relational data, with a pareto distribution for access, at a peak of 100K TPS". Then we can start talking about what solves the problem.