Hacker News new | ask | show | jobs
by asguy 1150 days ago
I think you're projecting an implementation of a queue in Postgres, which isn't how most people implement these things. [0] We're not doing table level locks, or creating contention with multiple queue producers or consumers, and they're not "one typo away from disaster in prod".

To do this right, you're using row level locking e.g. SELECT FOR UPDATE/SKIP LOCKED [1], and hopefully you're already using idle_in_transaction_session_timeout to deal with total consumer failures. A properly designed queue in Postgres runs more-or-less in parallel, and supports (really fantastic features like) atomic row locks across all resources needed to serve the queue request.

If you need extremely long consumer timeouts, it's also totally fine to use RLLs in addition to state on the job itself.

[0] - https://www.crunchydata.com/blog/message-queuing-using-nativ... [1] - https://www.2ndquadrant.com/en/blog/what-is-select-skip-lock...

1 comments

Even that first link explicitly calls out MVCC table bloat with that strategy? Like, you can do it right (and I have no comment on your implementation in particular, let's assume for the sake of argument it's fantastic), but it's easy to write a right-looking solution with the property that innocuous-looking one-line changes cause major issues. I wouldn't want an average development team to pursue that approach without good reason, especially when the alternatives are so easy to do right.
The first link also calls out how to deal with said bloat. It’s part of administering a Postgres DB yourself (which may or may not be something a team should be doing).

> I wouldn't want an average development team to pursue that approach without good reason, especially when the alternatives are so easy to do right.

Agreed. Good reasons I’ve had in the past are:

- wanting transactional guarantees across your DB data and queue (which you don’t get if your queue is external)

- not wanting to add more stack complexity (this has been an issue when you have to support on-prem deployments that you aren’t allowed to interact with).

I’m sure there are others.