Hacker News new | ask | show | jobs
by adaboese 860 days ago
Postgres is extremely expensive to scale. Why on Earth would you try to put queue there.
11 comments

If you're running less than a million tasks a day through a queue and you already have PostgreSQL in your stack why add anything else?
Even a million tasks a day is less than 12 a second. Most queues are going to have surges since that's part of the point of a queue, but it's still a few orders of magnitude away from what should overwhelm a database.
Just use a dedicated tool. It is not that hard. If you want higher level abstraction, you have a whole spectrum of next gen queues, like Temporal, Trigger.dev, Inngest, Defer, etc.
Why use a dedicated tool if you have something in your stack that can solve the problem already?

The less separate pieces of infrastructure you can run, the less likely something will break that you don't know how to easily fix.

The article touched on this in the list of things to avoid when it said "I estimate each line of terraform to be an order of magnitude more risk/maintenance/faff than each line of Python" and "The need for expertise in anything beyond Python + Postgres"

Personally the next-gen-ness of an infrastructure component is inversely proportional to my trust in it.
Right, use boring technology!

https://boringtechnology.club/

Especially for something handling data. I want an old, battle tested solution that won't disappear when the VC capital dries up
Maintaining extra infrastructure is expensive. Working around missing ACID is expensive. Depending on how many messages we are talking about, the cost of scaling postgres a bit more might be much lower.
https://microservices.io/patterns/data/transactional-outbox....

It allows you to wrap it all in a transaction. If you separate a database update and insertion into a queue, whichever happens second may fail, while the first succeeds.

Tell the alternative is a saga pattern to implement a distributed two phase commit.

Or actual XA, but that is cursed.

The mantra about premature optimization applies to infrastructure too.
Say you're running a SQL transaction and queue a message in SQS. The problem is, this message isn't part of your SQL transaction. So if your transaction fails, the SQS message won't roll back with it, potentially leading to inconsistencies. That's why sometimes it's better to use a queue inside an SQL database, to ensure both your transactions and queue messages are always in sync, and as a bonus, it simplifies the overall architecture and you will have fewer potential points of failure.
If you have an infra that need to scale so much then Postgresql isn't the right tool indeed. The right tools for your use case probably doesn't even exists and you will have to build one.

It is not a mystery why all webscale companies endup designing their own DB technology.

That being said, most of the DB in the wild are not remotely at those scale. I have seen my share of Postgresql/ElasticSearch combo to handle below TB data and just collapsing because of the overeng of administrating two DB in one app.

If you need scaling. Not all applications need scaling (e.g. I'm doing an internal tool for a company that has 1000 employees, it's unlikely that from one day to another that number of employee will double!), and for most applications a single PSQL server either locally or in the cloud is enough.
If the choice is between using a dedicated queue and postgres for your data vs. using postgres for both, using postgres for both makes perfect sense.

The scale at which you would outgrow using postgres for the queue, you would also outgrow using postgres for the data.

At what point does one outgrow Postgres for the data?
because you're not expecting to have to scale beyond 1 instance for the next few years & are already using postgres & now everything is trivially transactional. KISS
Any DB is fine as a queue depending on requirements, design and usage.
lol no it’s not.