Hacker News new | ask | show | jobs
by gizi 3669 days ago
Bash is the love of my life! I have been working for years on this problem now (not full-time of course), gradually moving in the direction of finally being able to challenge this:

"Inappropriate use

shell is the main domain specific language designed to manipulate the UNIX abstractions for data and logic, i.e. files and processes. ... Correspondingly, please be wary of writing scripts that deviate from these abstractions, and have significant data manipulation in the shell process itself. While flexible, shell is not designed as a general purpose language and becomes unwieldly when ... "

Another person has actually solved the most important show stopper already: http://ctypes.sh.

What now remains to be solved, are a few minor, additional details, and then simply writing a good manual of how to very successfully use bash as a general-purpose language.

My personal belief is that everything that you can do in other scripting languages, you can also do in Bash, only better.

3 comments

Ok, I'll bite.

>> My personal belief is that everything that you can do in other scripting languages, you can also do in Bash, only better.

1) Native JSON, XML

2) Classes, namespacing, objects

3) Multiprocessing, multithreading

4) Performance

5) Package management

6) Portability

7) Documentation

8) Runtime debugging (!set -x)

I'm too tired to continue.

>3) Multiprocessing

IMO shell makes it very easy to work with multiple process (&). It's built in and natural.

>4) Performance

If you are carefull and know what you're doing, you can achive very good performance with the shell. Usually, better performance is achived processing less data, ie being inteligent. Rarely depends on the language (unless you care about cycle level performance, then yes :).

>6) Portability

I claim that it's way easier to depend on sh being on a (UNIX) system than $SCRIPTING_LANG.

>7) Documentation

?? You can mess up documentation in any language.

Shell makes it easy to spawn multiple processes. It makes it reasonably easy to read those processes' standard out or standard error, though it's not that much fun to try to do both at the same time while keeping them distinct. [1]

It pretty much doesn't do anything else that you might want to do with multiple processes, though, and it tends to encourage multiple processes to communicate via text which is a problematic limitation that one often finds oneself "working around".

Shell is really powerful, but it hits a certain limit of what kind of tasks it can do and it hits that limit hard, and that's why when one imagines orchestrating many processes on a machine to do some task, to say nothing of orchestrating many processes on many machines, you don't see solutions based on shell, and indeed the very idea is laughable. Shell is best used by making sure it stays firmly restricted to the domain it shines in and not so much as trying to dip a toe into the spaces where it is not.

[1]: Note "not much fun" != "can't". Shell is fundamentally written around the idea that a process has one stream STDOUT that may go to other processes, and one stream STDERR which is generally intended to go to the console (or other user output like a log) no matter how complicated the pipeline. While you can get both streams and do things to them, you're starting to fight shell, which really wants to create pipelines with one "through" path with no branches out.

I think with the shell you have to adapt your abstractions to the "unix-way". For example, a queue to process will be a directory with N files, and each file can be processed in pararell by just something like "for f in dir/*; do process.sh "$f" & done;" but yeah ... it has limitations like everything.
With regards (3), my problem in shell is that it is very hard to spawn children without risking overloading the machine.

What I would like in bash is some easy way to limit the number of background processes I can spawn, and to just wait when I try to start another one until an existing one is finished.

Some simple jobs can be converted to use xargs -P, but for more complex things I end up having to do them without parallelisation, so I don't end up spawning 100s of background processes and bring my computer to it's knees.

Yes ... I think that should not be allowed (bring down the machine by a non-root user process). In Linux CPU_GROUPS/MEM_GROUPS can help, and the fair scheduler has improved the situation a bit from the old days where a fork bomb will bring the machine down.

But limiting the # of spawned children is possible using not so complicated ad-hoc solutions, but I guess it depends on the specific problem.

my personal belief is that anything one can do in bash i can do in sh. not sure if that's really true in practice, but that's my belief. i never use bashisms because i do not know what they are or how to use them.
You're just having a laugh at the OP, right?

In case you're not, here are some "Bashisms" that really suck to be without:

* built-in regex support (e.g. `[[ $var =~ ^1\.2\.[34]$ ]]`)

* process substitution (e.g. `diff <(before_command) <(after_command)`) and all sorts of other redirection tricks

* indexed and associative arrays

Some of this can be worked around by shelling off to grep for regular expression matching or awk for arrays, but Bash makes things so much cleaner and maintainable.

Add to that better error handling.

AFAIK in POSIX /bin/sh it's not possible to detect if a process that writes into a pipe exits with an error status.

bash has "set -o pipefail" and "$PIPESTATUS" for that.

But once you adopt bash-only features, you're losing the main argument for a shell script: portable scripting without the need to first install something to get something else running. Once you require Bash, it's equally easy to demand Perl and that will provide a much richer scripting experience.
When it this last time you logged into a box and /bin/bash wasn't an option? Let me guess 1999 on a SPARC box running Sun Solaris?
If you limit yourself to either a popular Linux distro or one of two Unixes/Unix-likes, then bash can be available out of the box. Just as C is not C++, shell is not bash, so a shell script is what runs on (d)ash, busybox, toybox, ksh, bash, zsh, etc., without modifications. If it requires bash or zsh, then it's not a shell script but a bash/zsh/fish script. To name a popular non-Linux OS, take a fresh FreeBSD or OpenBSD install. No bash to be found, unless installed via ports and rightfully so. That said, I use bash myself all the time as an interactive shell but /bin/sh is not bash. sh (including bash and zsh) are terribly hard to write correct and resilient scripts in, and even rc is much saner to script in.
> When it this last time you logged into a box and /bin/bash wasn't an option?

A few hours ago; and before that, last Wednesday. The latter says:

    JdeBP % /bin/bash
    zsh: no such file or directory: /bin/bash
    JdeBP %
The Z Shell is an add-on here, too. /bin/sh is the Korn shell. No, it's not OpenBSD.
today. the various BSD systems all come without bash in their default configurations (and it's never in /bin).

this should not be construed as an argument in support of /bin/sh.

On QNX all you get is ksh, which has little support for bash-isms.
Do embedded systems count? ;)
> My personal belief is that everything that you can do in other scripting languages, you can also do in Bash, only better.

Matrix multiplication?

Effective bash programming involves invoking other Unix commands. That's what the shell is for, after all. I don't know off the top of my head what you'd use for matrix multiplication; I guess it depends.

For another example, jq and jshon are excellent tools for querying, transforming, or creating JSON values.

My experience is that programming by composing Unix programs that communicate via pipes and files can be very pleasant and productive.

Sometimes I lament aspects of bash syntax and semantics, but it's also much more convenient than other scripting languages for tasks related to spawning programs and pipelines and working with files.

One of my recent projects that I've been having a lot of fun with is a kind of continuous integration system built mostly with bash scripts. It involves a lot of calls to git and Docker, process manipulation, file descriptor redirection, and file system manipulation—and bash makes all this pretty easy and concise.

You're missing out on Zsh ;)