Hacker News new | ask | show | jobs
by cellularmitosis 5387 days ago
I've been met with looks of disgust for using a filesystem to implement a queue, but I feel it's unjustified. A modern unix filesystem is surprisingly well suited to this task: You get atomicity "for free", inotify allows it to be interrupt driven rather than polled, it inherently supports multiple processes (thus different parts of the system can be implemented in different languages), there's no need for locking as long as you implement the queue using directories and 'mv', and it's extremely quick to implement, understand, and modify.

The only caveats are that of performance (with a traditional server I wouldn't worry about performance until you need to process hundreds of items per second, but on EC2 nodes that threshold is more near the range of dozens per second), and the need to regularly archive the "done" directory (cron solves this nicely).

7 comments

I recently read through news.arc (the source to Hacker News itself) and was dumbfounded by how such a simple, file-system backed system was able to cleanly and performantly handle many of the use cases of a document store or key value store. Are there any good resources on the DOs and DONOTs of building apps in this "Hey.... dummy.. Just use the file system!" -style?
Watch out for the 32,000 subdirectory limit. If your job tickets are complex enough to be implemented as a directory instead of a file, you'll get bitten by this (the number of files in a directory is only limited by the number of inodes in the entire filesystem).

If you are really lucky, and your tickets only need to represent a single piece of data (some sort of ID for example), you can just use the name of the file itself for the data storage and deal only with empty files. Because this only uses a single inode/block, it represents the best case scenario for speed and scalability in terms of the number of tickets which can accumulate before you need to archive. But more likely, you are going to have to worry about ticket namespace collisions (unless you have some sort of "set" like requirement where each ID can only be in the queue once at a time) which means you are using something like mktemp to create the file and then storing the ID inside the file.

Another key is to make sure you create new jobs in a "staging" dir, and then mv them into the "in" dir. Otherwise you have a race condition between your queuing system and whatever creates the tickets.

Here's a basic layout: /stage, /in, /active, /done. Some process on your system creates a ticket (which could be a single file or a dir) in /stage and then moves it into /in. This wakes up your queue, which moves it to /active when it starts processing it, and then moves it to /done and moves on to the next ticket in /in.

Another nice thing this gives you is that recovering from a crash / unclean state amounts to running ls on /stage, /in, and /active.

"32,000 subdirectory limit"

One top tip from personal experience is to make the resulting structure reasonably straightforward to browse manually - having huge numbers of subdirectories is going to be a barrier to this.

I can not find a reference to explicit "DOs and DONOTs", but you can surely gather experience from systems that have used this schema for a looong long time: mail handling systems.

For a quick start, I would look at the maildir specification, that includes instructions on how you should read form and write to maildir folders to avoid locking and get good performance: http://www.qmail.org/man/man5/maildir.html

Then, I would dive deeper by looking at the processes used to maintain the mail queues in qmail: http://www.qmail.org/qmail-manual-html/misc/INTERNALS.html . Obviously, you could also look at how postfix or exim handle their own queues.

Anyway, gathering all the experience buried in those systems and summarizing it in a logical way would make a great great article...

One big DONOT is: Don't do this if you need more than one physical host to be processing the jobs at the same time. Resiliency is hard to get right with shared filesystems.
You know, I set something up in a very similar way a few years back for a client. It was a quick a dirty hack to get a processing queue up and running fast with low overhead on the server (a VM with no resources). The processing was to take a PDF that would appear in the directory and then email or fax it depending on the directory.

I felt dirty while doing it, but didn't want to build up a whole ActiveMQ (or similar) queue solution - it was just overkill.

6 years out that simple hack is still working today without needing any sort of maintenance.

I suspect there's a large overlap between the people who would ridicule such an approach and the very people who find themselves in need of this article :)

A while back I looked at moving part of the queue into mysql, but I got stuck while trying to keep it a polling based system (I should have been able to accomplish this by having a mysql trigger touch a file in the filesystem, which would trigger inotify / wake up the queue, but I couldn't get it to work as described in the docs). After reading the author's mention of postgresql having some sort of listen/notify feature, I'll have to give that a look.

Here you go: http://www.postgresql.org/docs/current/static/sql-notify.htm...

I can't vouch for the performance characteristics, but it's got some nice features around how notification delivery interacts with transactions (notifications within an explicit transaction are not delivered until & unless the transaction commits successfully, order of notification from a single transaction is preserved), guaranteed delivery, and some degree of deduplication of identical notifications.

However... PostgreSQL's "SELECT FOR UPDATE" seems to have significantly better performance than MySQL's version, most likely due to how concurrency & MVCC vs. locking interact. A few years back at a now mostly-defunct social network which shall remain nameless I had to implement a cluster-wide work queue for sending out member emails that couldn't involve installing new software and had no shared disk space to use for that style. A queue based on an existing PostgreSQL installation (the PG process had a 3 year uptime at that point) using "SELECT FOR UPDATE WHERE worker_id IS NULL /LIMIT 1" followed by an immediate update of the worker_id and transaction end had quite good performance on mid-2000s hardware. As far as I could tell from my research then the limit 1 with no ordering clause locked only one row and concurrent processes each got a different one, so they didn't have to serialize on grabbing a job. Definitely do your own research and testing, but in my experience SELECT FOR UPDATE used carefully with a thorough reading of the docs is a much more viable solution on PostgreSQL than MySQL for a few hundred worker processes. I wouldn't try it for G+ or Twitter, but if you're dealing with more than the 50-100K daily active visitors and 25M or so customized emails that went out monthly I suspect you know you're going to be putting in some extra engineering time. http://www.postgresql.org/docs/9.1/static/sql-select.html#SQ...

This solution isn't necessarily terrible.

The only caveats are that of performance (with a traditional server I wouldn't worry about performance until you need to process hundreds of items per second, but on EC2 nodes that threshold is more near the range of dozens per second), and the need to regularly archive the "done" directory (cron solves this nicely).

...but why would you worry about these problems when other solutions like kestrel, beanstalk, and redis (my personal favorite) are equally easy to set up and understand?

And for that matter, how do you give multiple machines access to this workqueue?

As you point out, both of those are excellent points at which you should consider a "real" queuing system :)
Yeah, but why not just skip the intermediary step and use a "real" queueing system to begin with? It doesn't sound to me like it's any more effort in the short term or in the long term, and it's one less thing you have to worry about as you scale.
Gonna play devil's advocate here:

I think making files in a folder represents the least amount of effort for making a queue. So using one of the systems you described is necessarily more work.

I think the commenter outlined the reasons: any process can access the data with simple unix commands, and everyone understands files.

Plus files could be more efficient. What if the work unit you are processing are files? If the files are the work and the folder is the queue, you don't need any extra abstractions to access the data.

Because then you have to admin the real queueing system. If it's something simple, sometimes the one-time cost of re-solving the problem is less than the ongoing cost of dealing with that damn queueing system every time someone wants to set the app up on a new host, or a new dev wants to work with it, or it crashes, etc.

Fine line for when either approach is appropriate.

Beyond the novelty of doing it as a learning exercise the first time 'round, I agree that your approach is better where there's any expectation of scale.
Agreed++. I've done similar filesystem queues and have been told by corworkers "Yuck! That needs to be in a database!" ... so I ask why ... and the answer is "Because that's what databases are for!" Inevitably these cowrokers inherit the project, database every aspect of it, and then the app promptly collapses into a steady stream of downtime alerts. Yes, that's what databases are for: keeping DBA's gainfully employed.
Quick & dirty solutions like this often get dismissed out of hand but in practice something like this can be thrown together in a day but perform well enough to last until you know you've built something that merits a more robust implementation.

Unix's "everything is a file" philosophy can be stretched pretty damn far.

haha.. i've gotten those looks as well.

but i agree. files and folders are an elegant abstraction, that when combined with the unix toolset become extremely powerful.

The big shortcoming I see with this solution, and maybe this is what you are saying in the caveats, is that it doesn't support multiple worker boxes.

Of course you could use NFS, but this complicates it. Suddenly the consistency model is more complex and workers must partition work, and so on.. At that point, a mysql backed queue becomes an appealing and easy way to make a distributed queue.

My experience is that when something is filesystem based, you eventually have someone write a not-robust-enough bash script to do some maintenance operation (find|xargs|rm cleanup script, a sed based update script, etc) and it blows stuff up.

I think the transaction log and the forced structure of using SQL (barring some yutz carelessly using TRUNCATE) add some value managing the data, too. Not as big an issue where it's a single person maintaining the app.

i agree with your point. yet i've seen people make the same mistakes with sql too (they have autocommit=on haha).

hopefully whatever solution you have is tested and designed defensively so you don't accidentally rm the queue.

Those that don't undestand Unix ....

Blow their mind and show them join(1)

MySQL, nope I use postmap - http://www.postfix.org/postmap.1.html