Hacker News new | ask | show | jobs
by int0x80 3672 days ago
>3) Multiprocessing

IMO shell makes it very easy to work with multiple process (&). It's built in and natural.

>4) Performance

If you are carefull and know what you're doing, you can achive very good performance with the shell. Usually, better performance is achived processing less data, ie being inteligent. Rarely depends on the language (unless you care about cycle level performance, then yes :).

>6) Portability

I claim that it's way easier to depend on sh being on a (UNIX) system than $SCRIPTING_LANG.

>7) Documentation

?? You can mess up documentation in any language.

2 comments

Shell makes it easy to spawn multiple processes. It makes it reasonably easy to read those processes' standard out or standard error, though it's not that much fun to try to do both at the same time while keeping them distinct. [1]

It pretty much doesn't do anything else that you might want to do with multiple processes, though, and it tends to encourage multiple processes to communicate via text which is a problematic limitation that one often finds oneself "working around".

Shell is really powerful, but it hits a certain limit of what kind of tasks it can do and it hits that limit hard, and that's why when one imagines orchestrating many processes on a machine to do some task, to say nothing of orchestrating many processes on many machines, you don't see solutions based on shell, and indeed the very idea is laughable. Shell is best used by making sure it stays firmly restricted to the domain it shines in and not so much as trying to dip a toe into the spaces where it is not.

[1]: Note "not much fun" != "can't". Shell is fundamentally written around the idea that a process has one stream STDOUT that may go to other processes, and one stream STDERR which is generally intended to go to the console (or other user output like a log) no matter how complicated the pipeline. While you can get both streams and do things to them, you're starting to fight shell, which really wants to create pipelines with one "through" path with no branches out.

I think with the shell you have to adapt your abstractions to the "unix-way". For example, a queue to process will be a directory with N files, and each file can be processed in pararell by just something like "for f in dir/*; do process.sh "$f" & done;" but yeah ... it has limitations like everything.
With regards (3), my problem in shell is that it is very hard to spawn children without risking overloading the machine.

What I would like in bash is some easy way to limit the number of background processes I can spawn, and to just wait when I try to start another one until an existing one is finished.

Some simple jobs can be converted to use xargs -P, but for more complex things I end up having to do them without parallelisation, so I don't end up spawning 100s of background processes and bring my computer to it's knees.

Yes ... I think that should not be allowed (bring down the machine by a non-root user process). In Linux CPU_GROUPS/MEM_GROUPS can help, and the fair scheduler has improved the situation a bit from the old days where a fork bomb will bring the machine down.

But limiting the # of spawned children is possible using not so complicated ad-hoc solutions, but I guess it depends on the specific problem.