Hacker News new | ask | show | jobs
by mind-blight 56 days ago
The vacuum pressure is real. Using a system with the skip locked technique + polling caused massive DB perf issues as the queue depth grew. The query to see the current jobs in the queue ended up being the main performance bottleneck, which cause slower throughput, which caused a larger queue depth, which etc.

Scaling the workers sometimes exacerbates the problem because you run into connection limits or polling hammering the DB.

I love the idea of pg as a queue, but I'm a more skeptical of it after dealing with it in production

5 comments

Is your comment referring to this project specifically?

Because the docs say:

  PgQue avoids that whole class of problems. It uses snapshot-based batching and TRUNCATE-based table rotation instead of per-row deletion.

Would be great if you could specify if you had problems with the exact implementation linked by op or if you did write about a different thing, thanks!
Strange, you shouldn't have issues with vacuums on queue tables unless you're doing it wrong?

Were you not using partitions like this?

CREATE TABLE events_2026_04 PARTITION OF events FOR VALUES FROM ('2026-04-01') TO ('2026-05-01');

CREATE TABLE events_2026_05 PARTITION OF events FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');

https://www.postgresql.org/docs/current/ddl-partitioning.htm...

> Bulk loads and deletes can be accomplished by adding or removing partitions, if the usage pattern is accounted for in the partitioning design. Dropping an individual partition using DROP TABLE, or doing ALTER TABLE DETACH PARTITION, is far faster than a bulk operation. These commands also entirely avoid the VACUUM overhead caused by a bulk DELETE.

It was a lot more annoying earlier then pg 13 though, maybe you're just reminiscing things from the 2010s?

    > Scaling the workers sometimes exacerbates the problem because you run into connection limits or polling hammering the DB
Design question here (not familiar enough with this approach with Pg)

Would an alternative be to have a small pool of pollers that would "distribute" the records to a later pool of workers instead of having workers directly poll?

"The vacuum pressure is real. "

Felt like llm for a second.

What kind of throughput are we talking about?