| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by eikenberry 1 day ago
	> The reason DBs like Mongo or Dynamo exist is because Postgres has a scaling problem. I've used Postgres at a few places and the #1 problem was always high availability, not scaling. One Postgres cluster could easily handle 100000 transactions per minute, but when a primary node went down it was a page and manually failing over to the spare then manually replacing the spare. The manual tooling was very finicky but at least it worked, no automated solution came even close. Lack of a good HA story is why I avoid self-managed Postgres as much as possible.

6 comments

levkk 1 day ago

Good thing we support HA as well: https://docs.pgdog.dev/features/load-balancer/

Load balancer with health checks and failover, works out of the box. :) Battle-tested at this point too, so could be worth a look.

link

r7n 1 day ago

I've extensively used Dynamo (internally at Amazon and externally) and even founded a DB startup with it at it's core. Boiling down scalability of Postgres vs Dynamo as it's written in blog is a bit terse. Dynamo scales writes horizontally with the keyspace, forever. Postgres simply can't, and no number of layers between the machines and the developer changes that. Sharding, pooling, Citus are all layered on top of an engine where a given row's writes still land on one primary.

link

bofaGuy 1 day ago

Dynamo DB isn’t even good at being a KV store. Almost every time we have to also back it with S3 because of size limitations.

link

moomoo11 23 hours ago

did you use single table design?

and yeah you have to spend a lot of upfront time designing your data models

link

ngc248 19 hours ago

If you know your access patterns really well and they are non-relational, then you can design the best possible tables for dyanmoDB. In such a case DynamoDB works and scales amazingly. Ofc, you cannot do multi table relationships etc shoehorning a relational scheme onto DynamoDB does not work.

link

zamalek 1 day ago

Dynamo is a fundamentally different DB to Postgres. If your problem fits into the dynamo approach (I'd argue that more problems do), then you should be using it. No all problems fit, though.

link

r7n 1 day ago

Agreed, my critique was about how the article frames scalability. I've yet to see an OLTP problem that can't live in something like Dynamo. KV can model anything if you put in the work, the question is how much modeling discipline you trade for the scale, and in my experience the up front work is always worth it. Most of the time operational issues are swept under the rug and not consider tech debt.

Take for example AuroraDB: the sheer engineering it took to make SQL do scalable OLTP at all tells you how much that flexibility actually costs to keep.

link

ah27182 17 hours ago

Upfront modeling work is always worth it, but that only holds if you actually know your access patterns upfront. Most teams don’t, especially early on.

link

jsw 1 day ago

Curious how the DB startup with Dynamo at its core went. We use it heavily. The primary tricky thing for us at the moment is aligning pricing with workload value.

link

r7n 1 day ago

We obsessed over optimizations and pushing the apis to the limits of how we could pack it.

So much so, we re-wrote the DynamoSDK to squeeze out more optimizations so we could be the same cost even though we were a layer in front of dynamo. We used key encoding and other various technique as well as managed capacity (on demand vs reserved) to transparently optimize workloads for price. In our experience we saw dramatic gains vs just vanilla SDK usage.

If you're curious, here was the marketing website, but we're now part of Databricks: https://stately.cloud/

link

jsw 1 day ago

Interesting! We interact with the low-level APIs too vs the SDK, also: an IO scheduler for request batching and conn management, request hedging, full MVCC transactions, etc. We store raw bytes in DDB and manage schema/etc elsewhere. Curious if there is other low-hanging fruit, or not so low, you found that we haven't discovered yet.

link

cherioo 1 day ago

Except that dynamo is still just glorified mysql? https://news.ycombinator.com/item?id=18871661

I don’t think the backend matters. It’s the frontend wrapper that makes or breaks HA.

link

inigyou 1 day ago

If Dynamo is glorified MySQL then Hacker News is also glorified MySQL. The system is the whole system, not just one part of it.

link

eikenberry 1 day ago

That's great news! I'll bookmark this in case I'm forced to manage Postgres again.

link

MeetingsBrowser 1 day ago

What do you use instead?

link

eikenberry 1 day ago

I tend towards using key-value databases as I find them general purpose enough while being much more robust. I'm not married to any one in particular, depends on the requirements.

link

doctorpangloss 1 day ago

Is a load balancer HA?

link

gchamonlive 1 day ago

Not by itself if it's naive, but if it's able to assess target health and avoid degraded instances then it becomes a component in HA, the other being integrating an orchestrator for gracious recovery.

link

doctorpangloss 1 day ago

from their docs:

> PgDog does not detect primary failure and will not call pg_promote(). It is expected that the databases are managed externally by another tool, like Patroni or AWS RDS, which handle replica promotion.

link

nikolatt 1 day ago

Why the snark comment? The PgDog project has been around for a while, it's not vibe coded.

link

znpy 1 day ago

Not gp but I didn’t perceive any snark in the comment you are replying to

link

doctorpangloss 1 day ago

okay, it does appear that the LLM didn't write any of this. i guess the simple answer is that it is not HA.

link

dev-ns8 1 day ago

Combined with a replication strategy and automated health checks, a load balancer could direct traffic to a healthy instance automatically.

link

dotancohen 1 day ago

What happens when the load balancer fails?

link

inigyou 1 day ago

HA has to be all the way through, in which case you might not need a load balancer because each client already connects to a separate server. If you do, then you can have one load balancer per client machine.

link

parthdesai 1 day ago

Patroni 1.0 was released in 2016, i.e ~10 years ago.

https://github.com/patroni/patroni

link

eikenberry 9 hours ago

Noted. If I ever have to administer a Postgres setup again I'll take a look. Thanks.

link

nijave 1 day ago

Yup Patroni handles automatic failures and cluster management quite well

link

tempest_ 1 day ago

Patroni serves this niche pretty well at this point.

link

globular-toast 1 day ago

Have you looked into things like CloudnativePG? https://cloudnative-pg.io/

link

nijave 1 day ago

CNPG is quite nice and robust but I'd still be a bit reluctant to stack PG on k8s for really big clusters just because k8s ecosystem moves quite quickly and there's lots of patching/maintenance/churn which means more PG failovers so depends on how well your workload handles that (they're normally only a few seconds)

link

globular-toast 23 hours ago

Most K8s upgrades can happen independently of node reboots etc., you only need to update for OS updates really, but that would be true of anywhere you run PG, even RDS.

link

nijave 13 hours ago

>but that would be true of anywhere you run PG, even RDS

It's a little easier to strip down userland if the machine is only running PG. Technically possible on k8s with distros like Talos, Bottlerocket, etc but you still have all the k8s deps on top of PG. It's also a little easier to do defense-in-depth on a dedicated PG machine which means you might have mitigating controls in place to skip security patches (minimal kernel modules, selinux)--possible on k8s but now you're fighting through a 2nd layer of configuration

RDS is a bit of a special case because you also have AWS curating and prioritizing updates. You can do that yourself but it's a bit of a time sink scrutinizing every upgrade to see if you _really_ need it. Our RDS instances tend to go 3+ months without restarts

link

pinkgolem 1 day ago

Have you tried cnpg? Worked amazingly well for my usecases

link

VirusNewbie 1 day ago

~1600 TPS is not 'high scale'.

link

inigyou 1 day ago

Pretty good for 98% of projects though.

link