Hacker News new | ask | show | jobs
by imiric 1084 days ago
> Automation was glued together in one of these with a series of grep, awk, sed, ls, test, commands glued together. Anything more complicated was written in C and called from one of these things.

This doesn't sound that horrific to me. It's the classic Unix approach of building small tools that do one thing well, and composing them in novel ways to solve problems. For any problem that can't be solved this way you write another small tool using your programming language of choice. Rinse and repeat.

But occasionally Unix attracts users and programmers who reject this approach, and who prefer building a monolithic tool, or in the case of Larry Wall, new programming languages. To be clear, I'm a fan of Perl and think it has its place, especially in the era it came out. It inspired many modern languages, and its impact is undeniable.

Personally, I find solutions you refer to as "unmaintainable nightmare" to be simple and elegant, if used correctly. No, you probably shouldn't abuse shell scripts to build a complex system, and beyond a certain level of complexity, a programming language is the better tool. But for most simple processing pipelines, Unix tools are perfectly capable and can be used to build maintainable solutions.

The classic Knuth-Mcllroy bout[1] comes to mind. Would you rather maintain Knuth's solution or Mcllroy's?

[1]: https://matt-rickard.com/instinct-and-culture

8 comments

I don’t think you’ve seen the kind of scripts the person you are responding to is talking about.

I have, and mentioned one lower down in the comments. Unix philosophy was great but does not scale well in terms of maintainability or efficiency. Invoking processes over and over again loops is godawful slow. And the horror of complicated shell scripts is legendary.

As a self-taught coder, I've experienced many times how highly skilled software engineers groan and sweat when they encounter shell scripts. I don't understand why, but it seems like people with a CS background are never really taught shell scripts and have come to irrationally fear them. It's sort of taboo.

This results in weird behavior, such as writing a groovy (Java?) script for Jenkins to execute bazel in order to build a go binary that runs the very same commands in an exec.Command() construct. Or people who download and import pandas to grab the third field in a csv file.

During the course of learning, I've naturally written code in bash that should have been written in another language. I replaced if statements with case because they turned out to be more performant. It's a great learning experience and why I got into python and go.

IMO we should use the right tool for the job. Sometimes that tool is a combination of unix utilities that you can put in a shell script for easier maintenance. It's just procedural execution of (usually very efficient) binaries, akin to a jenkins script or gitlab pipeline. Just mind the exceptions and use exit codes.

I'm the type of person that would grab pandas to parse a CSV. Here's my reasoning

* often times, it's not just the third column I want. Sometimes it becomes "third column unless the first column is 'b' then instead grab the fourth column". Having a good data representation makes sure that I'm not mixing logic code with representation parsing code

* I don't have to care about CSV parsing edge cases. Escaped comma? Quotes? I don't care, the library will either handle it or throw an explicit error. With custom parsing code, instead of an error, I'll get some mangled result in the middle of the file that I won't even catch / notice until later down the line

* when working with CSVs, in my area (ML / scientific compute), Python is often the right context to be in.

> I've experienced many times how highly skilled software engineers groan and sweat when they encounter shell scripts. I don't understand why, but it seems like people with a CS background are never really taught shell scripts and have come to irrationally fear them. It's sort of taboo.

It's not a lack of being "taught" shell scripts. It's the fact that shell programming constructs aren't well documented, your "standard library" is basically dependent on whatever binaries happen to be available on the filesystem, error handling is almost non-existent, etc.

It's very easy to write a bad shell script that "solves" a problem as long as a bunch of assumptions aren't violated. In my experience, senior software engineers are extremely averse to hidden assumptions and very concerned with reliability of the systems they build.

Yea... "Well the script works fine in MY SYSTEM" was the most common issue with said scripts. Running across different versions of Linux was fraught with issues, much less any other operating system that could execute a shell script.

Of course this can happen with any language, especially as it ages and adds complexity.

I certainly have, and might have written a few of those myself.

But this doesn't make this approach inherently wrong or obsolete. The programmer is wrong for trying to use the tools beyond their capabilities. Where that line is drawn is subjective, as is the concept of maintainability, but if you feel that you're struggling to accomplish something, and that it's becoming a chore to maintain, the path forward is choosing a more capable tool, like a programming language.

I think we are actually mostly in agreement there then.

Perl was invented because the gap from shell to more capable languages was (and is) really big. Languages like Python and Ruby didn’t exist yet, and Perl had a really, really strong sweet spot in text processing.

> Perl had a really, really strong sweet spot in text processing.

Still does.

Since Ruby took the best bits of Perl what advantage does Perl retain?
>> what advantage does Perl retain?

Ubiquity, speed, and conciseness.

Perl is usually installed by default on Linux and Unix systems. Ruby might be there, it depends.

Perl is faster than Ruby. Ruby has been one of the slower scripting languages. But Ruby has been working on performance improvements in the past few releases. I have not seen any benchmarks of the current Perl versus the current Ruby, so this may have changed.

Perl is more concise than Ruby allowing more functionality for less code.

I used Perl then Ruby as my main language for almost a decade each. These days, I don't really write Ruby anymore; I moved on to Elixir and never looked back. But I still find myself using Perl on the command line, in contexts where Awk or Sed would also make sense. Ruby never optimized for the one-liner case IMO.
Yeah, the actual experience of leaky abstractions and non-portable code is forgotten. Perl solved a very real problem in the 90s. Grief, I shudder to think back to the sheer complexity of my bashrc file back then.
Yes, exactly. I have seen entire backend systems written in bash. Everything was shell script, sed and awk. The owner didn't want python or perl because he only knew bash and the related tools.

Everything was needlessly hard because these tools were not built for that. Easy to talk about philosophy and the "classic Unix approach" if you don't have to build modern applications this way.

Horror is apt.

You hit it on the head with the slowness of loops when the body comprises a series of program invocations. The horror really seeps in when you realize the original author wasn't stopped by the lack of data structures: they could get around that with some creative variable names.

Programming environments including shells and operating systems are just tools. And every tool can been misused. I opened a can of beans with a screwdriver once. Reality is messy. That doesn’t make the tool bad.
At the time, before everything became Linux, all these tools and the shells used to glue them together were an incoherent mess. Was your glue sh, ksh, csh, tcsh, bash or something uncommon like zsh? Did your grep, awk and sed use the same regexp syntax as your text editor? Single letter command line options, all meaning something different to each tool. Dozens of domain specific languages (shells, awk, sed etc.) meant dozens to learn and keep in your head. And you needed it in your head, because finding the information you needed in the massive single document man file reference was a pain because hypertext links had not been invented yet (well, probably in Emacs, which was another tool like Perl that people used to avoid the command line nightmare).
> or something uncommon like zsh

By the time our lord and savior zsh appeared on the scene Perl was already at Perl 3. And, to be fair, I do not think many used zsh before 2.1 which was some time 1991 fall and by then the Camel Book was out for half a year or something like that. So the pre-Perl and the we-use-zsh days do not really overlap.

When Perl appeared the absolute hotness was the version of Korn shell what later became known as ksh88. https://github.com/weiss/original-bsd/tree/master/local/tool...

> And you needed it in your head, because finding the information you needed in the massive single document man file reference was a pain

That's what the O'Reilly books were for, especially the Nutshell series.

I remember capturing every password at my university via "methods". Because we had a printer quota. In the summer when everyone was gone I printed out all the man pages (all the mans, the system libraries, etc) so I'd have a nice reference book. I made sure to make it so no one was charged any money.

The one thing people can't possibly fathom if they started coding after the mid-late 90s was how much we relied on the printed medium.

I still remember when we measured the documentation IBM shipped with the mainframes not in pages but in yards it occupied on the shelves. It was a lot.
Oracle used to be hard on your lower back, within the last 25 years.
The Unleashed and Bible series of books come to mind. Waite group or Sams or so, publishers.

Rather Waite-y.

Honest question-- why weren't the tools glued together or perhaps replaced entirely with the lisp inside Emacs? What was it missing?
While there are many possible answers, it eventually boils down to Unix being a C runtime and thus has a C culture. Lisp is from outside of this section of the world, so it simply had less adoption and support inside Unix land. Other languages, like sed, awk, and shell are not C but share its heritage(essentially, they were made by people close to the making of C.)
Memory and CPU efficiency, since most systems were memory constrained. My first server had 128MB of RAM...
> My first server had 128MB of RAM

Whippersnappers! :D

The first big iron I had the luck to work with was an IBM 3090 , essentially a gift from IBM, it handled the university entrance exams of the entire country of some ten million people and it had 64 MB of RAM. (It was also the first computer in Hungary permanently connected to the Internet via a leased line to Austria so it had an Austrian IP address. Hungary didn't have its IP region for two more years.)

I think the first machine with 128MB was a VAX 6510 a year or two later at another university. A little bit later, in 1994, CERN had gifted a VAX 9000 with an astounding 256MB of RAM.

To compare, the first server I installed Linux on had a grand total of 4MB RAM -- and that was one of the largest computers a small department at the university had.

It would be a long, long time before "128MB" and "mine" entered the same sentence.

Shades of the Monty Python sketch here, but the following is true...

4MB?!?

My first encounter with IBM kit was a, er, darn I'm not sure cuz I'm getting old, but I think it was a 4300? Not big iron in some senses, but still with a box that was something like 6-8 feet long iirc and definitely several feet wide and high. (And a bank of about 6-8 tape decks, each as tall as me, and two disk units, each the size of a washing machine, and so on.)

Its RAM? A massive 1 MB.

That IBM kit was the heart of the super new expensive upgrade in 1980 that cost something like 5-10 million pounds iirc to build, including a brand new building to house it and a team of programmers.

The older setup, which is where I was until its last days, was an ICL system that was expanded at the end of its life to a whopping 48KB -- yes, KB -- of RAM.

And that kit ran all the systems, internal (payroll, accounting, etc., etc.) and external (sales etc.) for the largest car dealership in the UK.

128MB? 4MB? Even 1MB? That was an unimaginably insanely large amount of RAM!

(Yes, it was very weird to be working with this physically enormous setup, and dealing with keeping it all cool enough not to halt for a half hour or so, through super human efforts when the A/C broke down, when the likes of PETs, Sinclair Z80s, and Acorn Atoms were a thing...)

> including a brand new building to house it

Ha yes the aforementioned IBM 3090 was so big for installation they removed the roof of the building it was living in, craned it in place and put the roof the back. Bringing it up the elevator or stairs was impossible.

Much later, in the second half of the 90s, I remember the four of us carrying an IBM HDD -- I think it was your normal 5.25" drive but it needed four people because it was mounted on a vibration dampening base ...

There's an old joke that emacs stands for Eight Megs And Constantly Swapping.
Eventually Munches All Computer Memory.

Other backronyms too.

Elisp is very much a niche language. For whatever reasons, the use of Elisp outside of Emacs is basically non-existent. Elisp is quite clunky, and AFAIK there hasn’t really been any big efforts to make it usable outside of Emacs. People who wanted Lisp outside of Emacs already had Common Lisp. (And Chez Scheme, and Scheme 48, etc etc.)
you probably shouldn't abuse shell scripts to build a complex system, and beyond a certain level of complexity, a programming language is the better tool

but the only free programming languages available at the time were C/C++, various shells, and awk. everything else was expensive or not generally usable for other reasons. all the really useful languages to build complex systems didn't really appear or become freely available until the 90s. and perl was first among those.

I'm not saying that Perl didn't have its time and place. It certainly fulfilled a need at the time for a language more capable than shell scripts, but less cumbersome than C/C++.

But the thing is that today the shell landscape is much more mature for solving simple problems, and we have C/C++ alternatives that are saner and more capable than Perl (e.g. Go). So it arguably has lost its place, as shell tools are still in widespread use, while Perl is mostly underused. Raku is interesting, but it goes in a different direction, and its adoption is practically zero.

Python killed the game.
Unfortunately. Ruby should have been Perl's natural successor. Python is the VHS of scripting languages. For a start it doesn't have a decent answer to Perl or Ruby's one-liners. Then there's the crippled lambda implementation. Python is a sad case of worse is better.
> For a start it doesn't have a decent answer to Perl or Ruby's one-liners.

This is by design. Readability is core to the design and philosophy of python. One liners are cool and fun to write, but trying to decipher someone else's incredibly dense bash or perl one-liner is absolutely awful.

>> One liners are cool and fun to write, but trying to decipher someone else's incredibly dense bash or perl one-liner is absolutely awful.

You can write hard-to-read code in any programming language.

Python lets you with mandatory whitespace so that the awfulness spans multiple lines instead.

Really talented Python programmers can do downright demonic stuff with list comprehensions.

Python appears to be simple, but is actually quite complex. I recommend reading "Effective Python" (https://effectivepython.com/) to see beneath the surface.

The readability complaint usually comes from people who never took the time to grok the language and its idioms. At least give the user the option. Advocating a language based on what it denies you doesn't make sense. Why use a scriptig language at all if belt and braces is what you're looking for?
> This doesn't sound that horrific to me. It's the classic Unix approach of building small tools that do one thing well, and composing them in novel ways to solve problems.

This works really well if your problem can be solved in one or two liners.

It go bad very quickly when, say, you have two CSV files and want to join them the sql-way. In sed, you have to use positional variables and think about shell escaping. In perl, you can at least name those variables and use \Q

> This works really well if your problem can be solved in one or two liners.

My personal comfort threshold is around the 100-line mark. It's even possible to write maintainable shell scripts up to 500 lines, but it mostly depends on the problem you're trying to solve, and the discipline of the programmer to follow best practices (use sane defaults, ShellCheck, etc.).

> It go bad very quickly when, say, you have two CSV files and want to join them the sql-way.

In that case we're talking about structured data, and, yeah, Perl or Python would be easier to work with. That said, depending on the complexity of the CSV, you can still go a long way with plain Bash with IFS/read(1) or tr(1) to split CSV columns. This wouldn't be very robust, but there are tools that handle CSV specifically[1], which can be composed in a shell script just fine.

So it's always a balancing act of being productive quickly with a shell script, or reaching out for a programming language once the tools aren't a good fit, or maintenance becomes an issue.

[1]: https://miller.readthedocs.io/

You’re discussing modern tooling in a conversation about early UNIX tooling. Back in the period being discussed, even ‘read’ was less functional. Ksh introduced a lot of the stuff we now take for granted, some of which wasn’t even available until the Ksh93 (long after Perl was released). Bash itself is a younger project than Perl. Albeit not by much.
Fair point. I'm not arguing that Perl wasn't an improvement back then, but that the approach of composing Unix tools is not inherently bad. And as the shell ecosystem evolved since then, and more capable programming languages appeared, Perl has been left by the wayside as a historical relic, rather than the replacement of Unix tools that Wall envisioned.

So I don't disagree that it was needed back then, but it's important to mention the modern context it struggles to exist in.

Perl is still a commonly used tool chain. It is far from being a “historic relic”.

I agree that there’s nothing wrong with composing UNIX tools. I mean, that was one of its key selling points. if you watch any early promo videos for UNIX you’ll see them talk heavily about the composability of the command line and shell scripting. It wasn’t an accident — it was designed that way.

The point of the conversation wasn’t to say that one shouldn’t write shell scripts, it was just to say that there was a massive and unfilled gulf between what was easy to do in Ksh, awk and sed, and what could be done in C.

> It go bad very quickly when, say, you have two CSV files and want to join them the sql-way.

Then just put them in a database and write a simple SQL query. If you use Perl it’s really very simple to do.

> It go bad very quickly when, say, you have two CSV files and want to join them the sql-way.

That sounds like a perfect use case for `join`.

> This doesn't sound that horrific to me.

I think it's possible that things that seem normal and inoffensive can become horrific simply from scale. You'll climb the stepladder without complaint, but then there's that radio tower in Canada...

Others like the idea of the family cow, then they see the 10,000 head feedlot from the highway.

Scale is sometimes sufficient by itself to induce horror.

csh was a decent interactive tool, but not great for scripting. Bourne shell had the right idea but there were so many bugs in various corners of it (I still sometimes end up writing "test "x$foo" = "xbar" even though shells that need that are long gone).

If you can depend on a recent bash and use shellcheck, then it's actually quite a pleasant programming environment, with fewer footguns than one might think. (I want a @#$@# "set -e" equivalent that returns non-zero from a function if any statement in the function results in non-zero).

There are some things that are more awkward than they should be though (e.g. given a glob, does it match 0, 1, or many files, or the way array expansions work).

Also, there's no builtin way to manage libraries (I don't know about Perl, but Python suffers from this as well). This results in me pasting a few dozen lines of shell at the top of any of my significant shell scripts, for quality-of-life functions. Then I have to use "command -v" to check if the various external programs I'm going to use are present. Say what you will about C, but a statically-linked C program can be dropped in anywhere.

Mostly agree. The modern shell scripting environment is much more robust than 30 years ago, with ShellCheck and some sane defaults, as you say. I also find it pleasant, once you get over some of its quirks.

As for managing libraries, that's true, but you can certainly import and reuse some common util functions.

For example, this is at the top of most of my scripts:

    set -eEuxo pipefail

    _scriptdir="$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")"
    source "${_scriptdir}/lib.sh"
This loads `lib.sh` from a common directory where my shell scripts live, which has some logging and error handling functions, so it cuts down on repetition just like a programming language would.
I haven't seen -E (aka -o errtrace) before, but it looks like a useful addition to the standard-ish -euxo pipefail.

Moreover, it's existence being required explains why my error handling, recently, wasn't working as expected.

The effortless composition of complex commands out of simple standalone programs is one of the best features of Unix. And yes, I admire and love it as well.

That said, imagine a metrics system for a huge networking company that used these methods to cover all automated testing or defect analysis. Those inner loops were made of greps and seds and so forth, and each one is the invocation of a new program. It wasn't uncommon for these runs to take almost a day.

Besides performance, the other nightmare was was someone described below: each script was a one-off that didn't leverage the work from others. If the author only new C shell, then you know you're going to be doing gymnastics to catch the stderr of some of those programs (you can't capture it in the same manner that Bourne variants do).

Anyway, yes, we all adore the Unix philosophy, but there are limits.

The "philosophy" that I never saw in commercial UNIXes, starting with my first experience with Xenix in 1993, beyond being endless repeated in FOSS circles, which is ironic given how GNU and BSD applications work, with their endless amount of command line parameters.