| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cellularmitosis 5434 days ago

Watch out for the 32,000 subdirectory limit. If your job tickets are complex enough to be implemented as a directory instead of a file, you'll get bitten by this (the number of files in a directory is only limited by the number of inodes in the entire filesystem).

If you are really lucky, and your tickets only need to represent a single piece of data (some sort of ID for example), you can just use the name of the file itself for the data storage and deal only with empty files. Because this only uses a single inode/block, it represents the best case scenario for speed and scalability in terms of the number of tickets which can accumulate before you need to archive. But more likely, you are going to have to worry about ticket namespace collisions (unless you have some sort of "set" like requirement where each ID can only be in the queue once at a time) which means you are using something like mktemp to create the file and then storing the ID inside the file.

Another key is to make sure you create new jobs in a "staging" dir, and then mv them into the "in" dir. Otherwise you have a race condition between your queuing system and whatever creates the tickets.

Here's a basic layout: /stage, /in, /active, /done. Some process on your system creates a ticket (which could be a single file or a dir) in /stage and then moves it into /in. This wakes up your queue, which moves it to /active when it starts processing it, and then moves it to /done and moves on to the next ticket in /in.

Another nice thing this gives you is that recovering from a crash / unclean state amounts to running ls on /stage, /in, and /active.

1 comments

arethuza 5433 days ago

"32,000 subdirectory limit"

One top tip from personal experience is to make the resulting structure reasonably straightforward to browse manually - having huge numbers of subdirectories is going to be a barrier to this.

link