Hacker News new | ask | show | jobs
by mazambazz 703 days ago
Hard disagree. I've written plenty in both. They both have their strengths, but bash is just more efficient if you're working with the filesystem. The UNIX philosophy of "do one thing and do it well" shines here. Python is more powerful but it's a double-edged sword. If I want to read a file containing API endpoints, send a request to them for some JSON, and do some parsing, I don't want to need or want to deal with importing modules, opening file objects, using dictionaries, methods, functions, etc.

Why do that when I can literally just ``` N=0 while read -r URL; do curl "$URL" | jq '.data[].someProp' | grep -v "filter" > "$N.data" N="$((N+1))" done < ./links.txt ```

The other thing is bash makes it exceptionally easier to integrate across different system tools. Need to grab something from a remote with `rsync`, edit some exif, upload it to a CDN? That's 3 commands in a row, versus god knows what libraries, objects, and methods you need to deal with in Python.

3 comments

Libraries are nice, until you have to write the glue code between the modules and functions. But sometimes you already have the features you want as programs and you just need to do some basic manipulation with their arguments and outputs. And the string model can work well in that case.
> Why do that when I can literally just ``` N=0 while read -r URL; do curl "$URL" | jq '.data[].someProp' | grep -v "filter" > "$N.data" N="$((N+1))" done < ./links.txt ```

So that other people can read and modify this.

? That code's meaning is extremely clear: it reads from a list of URLs that return JSON lists of JSON objects. For each URL, it pulls out some property and checks whether each line does not contain the string 'filter'. Those lines which clear the 'filter'-filter are written to a file whose name is the line of the original input file which contained the URL pinged suffixed by the extension '.data'.

It's very easy to read and modify if you just write it out longwise, which is what I'd always do in some actual script. (I also like to put reading data at the beginning of the pipeline, so I'd use a useless use of cat here.) To illustrate:

  N=0
  cat links.txt \
    | while read -r URL; do
        curl "$URL" \
          | jq '.data[].someProp' \
          | grep -v "filter" \
          > "$N.data"
        N="$((N+1))"
      done
It's a very simple pipeline with a single loop. It's not very different from pipelines you might write to transform some data using a thread macro in Clojure, or method chaining against a collection or stream in OOP languages like Java or Scala or Ruby or whatever you like.
now add error handling.
That's really not that hard to add above. A lot of folks act like it's impossible to handle errors etc. in bash, but it's pretty straightforward -- certainly no more difficult than in any other language. The hard part, like with all languages, is deciding how to handle errors cases. The rest is just code.
> That's really not that hard to add above.

Then show how it is going to look like?

On mobile so no idea if this a) looks good or b) runs (especially considering the command substitutions, but you could also redirect to temp files instead), but it's just something like this:

    N=0
    while read -r URL; do
      data="$(curl "$URL")" || { printf 'error fetching data\n' && exit 1 ; }
      prop="$(jq '.data[].someProp' <<< "$data" || { printf 'error parsing JSON response\n' && exit 1 ; }
      grep -v "filter" <<< "$prop" > "$N.data" || { printf 'error searching for filter text\n' && exit 1 ; }
      N="$((N+1))"
    done < ./links.txt
bash also has e.g. functions, so you could abstract that error handling out. Like I said, not that weird.

Edit: oh good, at least the formatting looks ok.

You forgot "-f" flag to curl, which means it won't fail if the server returns an error. Also "jq" returns success on empty input, pretty much always. Together, this might mean that networking errors will be completely ignored, and some of your data files will mistakenly become empty when server is busy. Good luck debugging that!

And yes, you can fix those pretty easily.. as long as you aware about them. But this is a great illustration why you have to be careful with bash: it's a 6-line program which already has 2 bugs which can cause data corruption. I fully agree with the other commenter: switch to python if you have any sort of complex code!

plus the delightful mechanism of

    set -euo pipefail # stop execution if anything fails
    cleanup () {
        rm $tmp_file
        popd
        # anything else you want to think of here to get back to the state of the
        # world before you ran the script
    }
    trap cleanup EXIT # no matter how the program exits, run that cleanup function.
A really good rundown of the different options for set https://www.howtogeek.com/782514/how-to-use-set-and-pipefail...
But why bother? The moment you start doing all that, all the arguments of "oh look how much i can solve with my cool oneliner" goes away. The python version of that code is not only safe by default, it's also shorter and actually readable. Finally it is a lot more malleable, in case you need to process the data further in the middle of it.

    for N, url in enumerate(Path("links").read_text().splitlines()):
        resp = requests.get(url)
        resp.raise_for_status()
        prop = resp.json()["data"]["someProp"]
        matches = (line for line in prop.splitlines() if "filter" not in line)
        Path(f"{N}.data").write_text("\n".join(matches))
I'm sure there is something about the jq [] operator i am missing but whatever. An iteration there would be a contrived use case and the difficulty to understand it on a glance just proves I'm not interested. As someone else mentioned, both curl and jq requires some extra flags to not ignore errors, i can't say if that was intentional or not. It would either way be equally easy to solve.
I never said anything about one-liners?

    set -euo pipefail; N=0 while read -r URL; do curl "$URL" | jq '.data[].someProp' | grep -v "filter" > "$N.data" N="$((N+1))" done < ./links.txt