| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jashmatthews 672 days ago

Putting low throughput queues in the same DB is great both for simplicity and for getting exactly-once-processing.

Putting high throughput queues in Postgres sucks because...

No O(1) guarantee to get latest job. Query planner can go haywire.

High update tables bloat like crazy. Needs a whole new storage engine aka ZHEAP

Write amplification as every update has to update every index

LISTEN/NOTIFY doesn't work through connection pooling

3 comments

mickeyp 672 days ago

Update-related throughput and index problems are only a problem if you update tables. You can use an append-only structure to mitigate some of that: insert new entries with the updated statuses instead. You gain the benefit of history also. You can even coax the index into holding non-key values for speed with INCLUDE to CREATE INDEX.

You can then delete the older rows when needed or as required.

Query planner issues are a general problem in postgres and is not unique to this problem. Not sure what O(1) means in this context. I am not sure pg has ever been able to promise constant-time access to anything; indeed, with an index, it'd never be asymptotically upper bounded as constant time at all?

link

jashmatthews 671 days ago

By the time you need append-only job statuses it's better to move to a dedicated queue. Append-only statuses help but they also make the polling query a lot more expensive.

Deleting older rows is a nightmare at scale. It leaves holes in the earlier parts of the table and nerfs half the advantage of using append-only in the first place. You end up paying 8kb page IO costs for a single job.

Dedicated queues have constant time operations for enqueue and dequeue which don't blow up at random times.

link

felixyz 671 days ago

With a partitioned table you can painlessly remove old rows. Of course, you then have to maintain your partitions, but that's trivial.

link

jashmatthews 671 days ago

It's far from trivial. Autoanalyze doesn't work on partitioned tables, only on the partitions themselves. Partitioning a busy job queue table is a nightmare in itself.

link

iTokio 671 days ago

partitions are often used to drop old data in constant time.

They can also help to mitigate io issues if you use your insertion timestamp as the partition key and include it in your main queries.

link

jashmatthews 671 days ago

Yeah the ULID/UUIDs which can be be partitioned by time in this way are AWESOME for these use cases.

link

paffdragon 671 days ago

Indeed, that's my experience too. We used partitions like others mentioned below, but Postgres had issues with moving rows across tables atomically and had to implement our custom complex queries to overcome it. Plus job expiration was dynamic and had to use background cleaning. The bigger problem was with the planner not able to pick up sudden changes in volume and had to use a cron to run analyze on it. Managing retries with backoffs, etc.. At some point we stopped fighting it and just moved to SQS, we have zero problems since, no maintenence needed, and it's still free so we saved storage cost, time and developer effort for ongoing maintenance.

We still use Postgres for simple queues, but those don't really require a library as it's quite simple usually, with some advisory locks we can handle the crashed job unlocking fairly well too.

link

ulrikrasmussen 671 days ago

> LISTEN/NOTIFY doesn't work through connection pooling

What's the problem with using it with connection pooling?

link

jashmatthews 671 days ago

Best to just forget about it and listen on a connection not using pooling.

https://jpcamara.com/2023/04/12/pgbouncer-is-useful.html#lis...

link

jeeybee 671 days ago

asyncpg clears out any listeners you have setup once a connection is returned to pool. This will lead to 'missed' events. I guess its something of the same story with psycopg?

If event(s) any jobs will be picked up by the next event or by a timer that checks every 30 seconds or so (can be set by the dev.)

link