Hacker News new | ask | show | jobs
by brandonbloom 4588 days ago
Maybe some controversial advice: Go ahead, fall in these pits.

I write my fair share of shell scripts and I've hit practically every one of these snags in the past. However, for the majority of tasks I perform with bash, I genuinely don't care if I support spaces in filenames, or if I throw away a little efficiency with a few extra sub-shells, or if I can't test numbers vs strings or have a weird notion of booleans.

Your scripts are going to have bugs. The important question is: What happens when they fail?

Are your scripts idempotent? Are they audit-able? Interruptible? Do you have backups before performing destructive operations? How do you verify that they did the right job?

For example, if your shell scripts operate only on files under version control, you can simply run a diff before committing. Rather than spent a bunch of time tracking down a word expansion bug, you can simply rename that one file that failed to not include a space in its name.

2 comments

I live and die by the shell. I'm constantly composing little one-liners, and keep an absurdly long Bash / zsh history to draw from. There are places the obvious answer is almost always "how about you just write a shell script?"

That said, I long ago reached a place where I realized that, while shell scripting is entertaining, I'd much rather write anything more than a handful of lines in a general purpose programming language. Perl, Python, Ruby, whatever - even PHP involves far less syntactic suffering and general impedance than Bash. It's not that I'm exceptionally worried about correctness in stuff that no one besides me is ever going to use, it's just that once you're past a certain very low threshold of complexity, the agony you spend for a piece of reusable code is so much less. Even just stitching together some standard utilities, there are plenty of times it'll take a tenth as long and a thousandth as much swearing to just write some Perl that uses backticks here and there or mangles STDIN as needed.

  > Are your scripts idempotent? Are they
  > audit-able? Interruptible? Do you have
  > backups before performing destructive
  > operations? How do you verify that they
  > did the right job?
Every single one of these questions is easier to answer if you're using a less agonizing language than Bash and its relatives.
> Every single one of these questions is easier to answer if you're using a less agonizing language than Bash and its relatives.

I disagree. While the set of things that are "hard" to do is probably larger in shell than the alternatives, the specific questions posed by the grandparent are hard in any language. They all boil down to "how can I correctly do something which has side effects (on external state)?"

Statefulness itself is a pain, and shell is in some sense the ultimate language for simply and flexibly dealing with external state.

Simplicity: the filesystem is an extremly simple and powerful state representation. Show me a language that interacts with the fs more concisely than

    tr '[A-Z]' '[a-z]' < upper.txt > lower.txt
Flexibility: if shell can't do it, just use another program in another language that can, like `tr` in the above example. What other language enables polyglot programming like this? Literally any program in any language can become a part of a shell program.

> it's just that once you're past a certain very low threshold of complexity, the agony you spend for a piece of reusable code is so much less.

Here's where I admit I was playing devil's advocate to an extent, because I fully agree with you here. I write lots of shell scripts. I never write big shell scripts. Above some length they just get targeted for replacement in a "real" language, or at the very least, portions of them get rewritten so they can remain small.

Empirically, it also seems true that shell is harder for people to grasp, harder to read, and harder for people to get right. These are real costs that have to be figured in.

PS. Speaking of shell brennen, we should be working on our weekend project. :)

> tr '[A-Z]' '[a-z]' < upper.txt > lower.txt

That's the biggest problem: some things are very simple, but other things fall off a cliff. For example, as a related task I ran into recently: how do you replace FOO with the contents of foo.txt? The natural way would be expanding it into a command line, but at least with sed that's no good even for nice short text files because / and \n are special. You can use a sed command to read a file which I didn't know existed until I looked it up, but it apparently has the delightful feature that "If file cannot be read for any reason, it is silently ignored and no error condition is set." You can use perl... you can use perl to easily do a lot of things that are really hard to do otherwise (including things as simple as matching a regex and printing capture groups), but at least to me it feels really awkward to wrong to mix two different full-fledged languages. Maybe I should just get over that, but I wish the whole thing were more coherent.

Interesting problem. Some quick head-scratching and googling didn't turn up anything useful on merging templates with awk and sed... then it hit me --- m4 is used for that:

   sed -r 's/FOO/include(foo.txt)/g' temp.txt |m4
Interesting solution; I should learn to use m4 for various tasks. Probably would have already if I didn't have such a negative visceral reaction to autotools :)
You can use cpp.

As for "capture groups", you can use lex. I wrote a "code generator" shell script to produce .l files and another script that compiles .l files to one-off utilities.

There is perhaps more coherence to the whole thing than you are aware of. Whether "Linux distros" or "OSX" have maintained that coherency I do not know.

You can use sed's external command mechanism to do that. This replaces lines of the form "include foo.txt" with the contents of foo.txt.

    sed 's/^include \(.*\)/cat "\1"/e'
Sounds useful, but it's not portable, and doesn't work on OS X. I suppose I could just switch to GNU sed, since I mostly care about interactive use, but thus far I haven't done so.
The polyglot is not a quality to me. Constantly serializing/parsing streams of text is pretty unpretty IMHO.
I totally agree. Sometimes the strength of making a 'quick bash script', is that you are making a quick bash script. I have made some pretty strong, well tested projects in bash before including one that was a big part of an open-source qmail project.

Sometimes though you just need to get stuff done with the least amount of fuss, without worrying about the extreme edge-cases which the majority of the webpage attached to this story talk about. Heck, it's probably the majority of bash work I'd say that ends up like that.

On the other hand, if you examine the given examples, you'll see that very often the "correct" way isn't really longer or harder to type. If you make a point of sticking to the correct way, it will eventually become automatic, and there'll be no fuss and worry. This might end up saving some sorry ass later on.

And having learned the correct way, you'll instantly see it when a script you review is doing something in a way that will eventually bite someone.

There's no downsides to learning and doing things right. Of course it does take some extra time and effort at the start, as everything..

Sure, there are extreme examples that can be really hard to handle portably and safely if you're doing something more complicated (embedded newlines in filenames come to mind). So in the end some corner cutting is often inevitable :-)

> There's no downsides to learning and doing things right.

There is never "no downsides".

For one example, there is "feedback fatigue". It's easy to pick on syntax or "small" semantics during a code review, but that's not nearly as useful as analyzing the big picture. I have been party to many code reviews that involved a dozen nit-picky stylistic comments. The reviewer feels like they did their job, the reviewee feels like they have appeased the reviewer, and so the now style-guide-correct code gets a half-hearted "LGTM" and is merged. That code looks great... And it handles filenames with spaces in them!.. But does the totally wrong thing.