Hacker News new | ask | show | jobs
by reacharavindh 2899 days ago
Just today, I used xargs instead of spending a lot of time building a batching script in Python. I wanted to launch a bunch of processes in a queue but only execute 10 of them in parallel at any time.

Here is a skeleton of what I came up with.

    find $(pwd) -mindepth 1 -maxdepth 1 -type d -name ".zfs" -prune -o -type d -print0|xargs -0 -P 2 -I {} echo {}
where,

$(pwd) indicates the starting point of the listing of directories

-mindepth 1 makes sure current directory is not listed once again.

-maxdepth 1 makes sure the list does not get recursive

-type d -name - only directories and list names

".zfs" -prune - makes it ignore .zfs (snapshot directories)

-print0 - makes sure to print results without newlines. just -print will print one result per line

xargs -0 will take care of processing out spaces or newlines in the input stream

-P 2 — run two processes at once in parallel

-I {} says that replace {} in teh subsequent command from stdin piped into xargs echo {} will be echo dir1 and then echo dir2 etc

That's just an example to show that we can do a lot with standard Unix tools before bringing in the external sophistication for data related tasks.

1 comments

And with GNU parallel, which can take the place of xargs, you can even distribute that job across multiple machines easily (as long as they're accessible by SSH).
Yes, I need to look into whether and how Gnu Parallel will queue up tasks if I restricted the number of parallel processes.

In my case, I was dealing with a FreeBSD server. I went the xargs route instead of installing something that is not available by default.