Hacker News new | ask | show | jobs
by zbobet2012 3234 days ago
You can _easily_ buy a box with 60+TB of SSD...

http://www.dell.com/en-us/work/shop/povw/poweredge-r930

Some of us do need to shard for sure though (I have multi petabyte data sets).

1 comments

Storing 60+TB of data is different than searching and doing complex computation on 60TB of data.

Also, operations on a such huge data set can be really painful. Think how to backup a DB like that safely, or how to update the engine.

Some slides (little old, 2014) about a huge postgres instance serving as a backend for leboncoin.fr (main classified advertising website in France).

https://fr.slideshare.net/jlb666/pgday-fr-2014-presentation-...

Basically, they bought the best hardware money could buy at the time to scale vertically, they, in the end, run in some issues and started thinking about sharding this huge DB.

Absolutely. Queries on 60TB of data can certainly merit more than one box. Hell, queries on 1TB of data can merit more than one box.

I have a workload that runs close to 1.2million TPS for hours at a time and needs less than 100 millisecond response times at the 99th percentile. That uses more than 1 box and sits (replicated) in RAM.

However, 5TB of data really _isn't_ that much on modern SSD's. You can fit a sizable chunk of that in RAM on a decent server, so you probably _don't_ need more than one box.

I have 5TB of data that needs to sit on an SSD is, to be honest, a really poor performance metric. If you are genuinely specing out hardware and a database a better statement would be:

"I have 5TB of relational data, with a pareto distribution for access, at a peak of 100K TPS". Then we can start talking about what solves the problem.