Hacker News new | ask | show | jobs
by fusiongyro 4953 days ago
The one thing that concerns me about using the database as a queue is that MVCC doesn't really lend itself to writing threading primitives like locks. I'm curious how one would go about writing a queue in an MVCC architecture--off the top of my head, I guess you could have a job assignments table to link processors and jobs, make the job FK unique and interpret forced rollbacks as indicating that another thread grabbed the job before you did. Then again, if your queue is only running idempotent functions it wouldn't matter if you had more than one thread doing the same work, it would just be a waste of time.
3 comments

This was one of the main reasons explained to me on why db based queues are a code smell.

The other was that insert/select pattern of a queue hammers a db's index and data page fragmentation algorithms. Anyone know if this is still the case or am I repeating a wives-tale?

There are so many factors that go into database performance, but I would expect high frequency queuing would be problematic even today for the reasons you mention. If the load is low it's probably fine.
I believe one can make a stored procedure that implements an atomic enqueue or dequeue operation in a transaction. Still, such systems seem to end up polling, or having triggers that launch non-database activities such as scripts or network activity.

Note that I'm not endorsing this scheme.

Postgres has non-classic-rdbms features suitable for lock/wait sort of behavior for a queue.

I haven't experimented with it myself, beyond to note it's there.

http://www.postgresql.org/docs/9.1/static/sql-notify.html

Prior to 9.1, you couldn't include information in the notification. So you could prevent the listening threads from polling the table looking for work, but you'd still have the race condition. I don't know whether being able to include information in the notification is sufficient to prevent the race condition either. I suppose if you notified with the ID of the entry, and the IDs are evenly distributed across integers you could do modular arithmetic with the thread IDs but you'll have to coordinate among the threads. I also don't know what sort of guarantees Postgres makes about notification delivery.

My intuition is that this isn't enough to bypass the problem, because it addresses the polling side rather than the enqueuing/dequeuing side. But that doesn't mean there isn't a clever way to put this to use.