Hacker News new | ask | show | jobs
by mbreese 5387 days ago
You know, I set something up in a very similar way a few years back for a client. It was a quick a dirty hack to get a processing queue up and running fast with low overhead on the server (a VM with no resources). The processing was to take a PDF that would appear in the directory and then email or fax it depending on the directory.

I felt dirty while doing it, but didn't want to build up a whole ActiveMQ (or similar) queue solution - it was just overkill.

6 years out that simple hack is still working today without needing any sort of maintenance.

1 comments

I suspect there's a large overlap between the people who would ridicule such an approach and the very people who find themselves in need of this article :)

A while back I looked at moving part of the queue into mysql, but I got stuck while trying to keep it a polling based system (I should have been able to accomplish this by having a mysql trigger touch a file in the filesystem, which would trigger inotify / wake up the queue, but I couldn't get it to work as described in the docs). After reading the author's mention of postgresql having some sort of listen/notify feature, I'll have to give that a look.

Here you go: http://www.postgresql.org/docs/current/static/sql-notify.htm...

I can't vouch for the performance characteristics, but it's got some nice features around how notification delivery interacts with transactions (notifications within an explicit transaction are not delivered until & unless the transaction commits successfully, order of notification from a single transaction is preserved), guaranteed delivery, and some degree of deduplication of identical notifications.

However... PostgreSQL's "SELECT FOR UPDATE" seems to have significantly better performance than MySQL's version, most likely due to how concurrency & MVCC vs. locking interact. A few years back at a now mostly-defunct social network which shall remain nameless I had to implement a cluster-wide work queue for sending out member emails that couldn't involve installing new software and had no shared disk space to use for that style. A queue based on an existing PostgreSQL installation (the PG process had a 3 year uptime at that point) using "SELECT FOR UPDATE WHERE worker_id IS NULL /LIMIT 1" followed by an immediate update of the worker_id and transaction end had quite good performance on mid-2000s hardware. As far as I could tell from my research then the limit 1 with no ordering clause locked only one row and concurrent processes each got a different one, so they didn't have to serialize on grabbing a job. Definitely do your own research and testing, but in my experience SELECT FOR UPDATE used carefully with a thorough reading of the docs is a much more viable solution on PostgreSQL than MySQL for a few hundred worker processes. I wouldn't try it for G+ or Twitter, but if you're dealing with more than the 50-100K daily active visitors and 25M or so customized emails that went out monthly I suspect you know you're going to be putting in some extra engineering time. http://www.postgresql.org/docs/9.1/static/sql-select.html#SQ...