|
|
|
|
|
by kakwa_
3236 days ago
|
|
Storing 60+TB of data is different than searching and doing complex computation on 60TB of data. Also, operations on a such huge data set can be really painful. Think how to backup a DB like that safely, or how to update the engine. Some slides (little old, 2014) about a huge postgres instance serving as a backend for leboncoin.fr (main classified advertising website in France). https://fr.slideshare.net/jlb666/pgday-fr-2014-presentation-... Basically, they bought the best hardware money could buy at the time to scale vertically, they, in the end, run in some issues and started thinking about sharding this huge DB. |
|
I have a workload that runs close to 1.2million TPS for hours at a time and needs less than 100 millisecond response times at the 99th percentile. That uses more than 1 box and sits (replicated) in RAM.
However, 5TB of data really _isn't_ that much on modern SSD's. You can fit a sizable chunk of that in RAM on a decent server, so you probably _don't_ need more than one box.
I have 5TB of data that needs to sit on an SSD is, to be honest, a really poor performance metric. If you are genuinely specing out hardware and a database a better statement would be:
"I have 5TB of relational data, with a pareto distribution for access, at a peak of 100K TPS". Then we can start talking about what solves the problem.