Hacker News new | ask | show | jobs
by levl289 51 days ago
Yep, this is what I think about when “scaling” is mentioned. Maybe I’m too distributed-compute brained, but throwing CPU at a db isn’t what I was hoping would be the answer.
1 comments

So the point of distributed compute is to reduce the compute needed? I’ve generally found that distributed compute requires more compute than vertical scaling while getting clobbered by network bandwidth / latency.

Theoretically with 2 to 10x compute required and in practice 100 to 500x

I think for databases horizontal scaling for writes only makes sense once vertical scaling stops working. It comes with high complexity, annoying limitations, and often higher cost.

Horizontal scaling for reads on the other hand is much easier. If you have multiple replicas for high availability, you might as well put them to work. It can also reduce the risk for read heavy tasks interfering with transaction processing. You can even go a step further and replicate to a database that's optimized for analytical tasks.

Horizontal scaling for stateless applications (e.g. web servers or job processors) is often easier and more robust than vertical scaling, with little to no downsides.

The point of distributed computing is to do computing that you can't do on a vertically scaled system or to increase availability.

If you're doing it for other reasons it's usually a mistake.

The advice I’ve gotten is that you want to move computation to data that is already distributed. The cost of moving large amounts of data usually dwarfs compute costs (usually, not always), and so the performance win comes from distributing the computation and then (depending on the problem) centralizing aggregate results.
Another pretty good reason to do distributed computing is to move the computation closer to where the data is or where the data will be consumed.