|
|
|
|
|
by charltones
4164 days ago
|
|
There are command line tools available to help the transition from 'hack' one liner to a more maintainable / supportable solution. For instance drake (https://github.com/Factual/drake) a 'Make for data' which does dependency checking would allow for sensible restarts of the pipeline. The O'Reilly Data Science at the Command Line book (linked elsewhere in the comments) has a good deal to say on the subject: turning one liners into extensible shell scripts, using drake, using Gnu Parallel. |
|
An excellent tool ...apparently an improvement on xargs even for local parallel tasks ( see http://unix.stackexchange.com/questions/104778/gnu-parallel-... )