Hacker News new | ask | show | jobs
by q3k 53 days ago
Yes, you can scale it quite well vertically.

But how about horizontally? It would be nice to have high availability, or even to be able to upgrade the OS and postgres itself without downtime.

6 comments

Shameless plug[0].

[0] https://pgdog.dev

AFAIK that's what Multigres[0] and Neki[1] are trying to solve.

[0] https://multigres.com/ [1] https://neki.dev/

Only played around with it but you can use patroni, etcd and HAproxy to achieve this. It’s a pain, but I believe there was some kind of coolify-style open source application to do this for you but I can’t for the life of me remember its name
You might be thinking of Pigsty?

Atleast I hope you are! Nothing else has been as well battletested. Unfortunately, perhaps because if its name, it gets no facetime on HN. Its last few mentions here barely received attention it deserved.

autobase[1] is the one I can think of

[1] https://github.com/autobase-tech/autobase

Yep, this is what I think about when “scaling” is mentioned. Maybe I’m too distributed-compute brained, but throwing CPU at a db isn’t what I was hoping would be the answer.
So the point of distributed compute is to reduce the compute needed? I’ve generally found that distributed compute requires more compute than vertical scaling while getting clobbered by network bandwidth / latency.

Theoretically with 2 to 10x compute required and in practice 100 to 500x

I think for databases horizontal scaling for writes only makes sense once vertical scaling stops working. It comes with high complexity, annoying limitations, and often higher cost.

Horizontal scaling for reads on the other hand is much easier. If you have multiple replicas for high availability, you might as well put them to work. It can also reduce the risk for read heavy tasks interfering with transaction processing. You can even go a step further and replicate to a database that's optimized for analytical tasks.

Horizontal scaling for stateless applications (e.g. web servers or job processors) is often easier and more robust than vertical scaling, with little to no downsides.

The point of distributed computing is to do computing that you can't do on a vertically scaled system or to increase availability.

If you're doing it for other reasons it's usually a mistake.

The advice I’ve gotten is that you want to move computation to data that is already distributed. The cost of moving large amounts of data usually dwarfs compute costs (usually, not always), and so the performance win comes from distributing the computation and then (depending on the problem) centralizing aggregate results.
Another pretty good reason to do distributed computing is to move the computation closer to where the data is or where the data will be consumed.
Practically trivial to do in 2026 even by hand, and there are a couple of ready to use solutions that even make it automated.

If you're running it in kubernetes with cloudnativepg it's even easier.

The only thing it doesn't do well is master master replication which is why most of these does it scale posts mostly talk about how slow writes are. And they are pretty slow.

It all depends on the storage underneath. If you have got good storage, with CNPG you have comparable results you get on bare metal PostgreSQL.
Only reads scale. You get (much) worse writes for sure.