|
|
|
|
|
by lyle_nel
3459 days ago
|
|
I have a small cluster of machines that I run experiments on. GNU parallel makes the dispatch of jobs on remote machines very easy. In addition, I often use it to search for sequences by running grep in parallel. For example $ parallel 'grep {1} -f haystack.txt' :::: many_needles.txt Where {1} is a single line in many_needles.txt |
|
It may seem to be a trivial difference, but then you can search multiple haystacks at once fairly easily, and this approach scales to hundreds of millions of needles at once. The code for it isn't very difficult either, heck you can just use an in memory SQLite dB to get a searchable, temporary, index and rely on using some of the most tested software in history.