| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mazambazz 703 days ago

Hard disagree. I've written plenty in both. They both have their strengths, but bash is just more efficient if you're working with the filesystem. The UNIX philosophy of "do one thing and do it well" shines here. Python is more powerful but it's a double-edged sword. If I want to read a file containing API endpoints, send a request to them for some JSON, and do some parsing, I don't want to need or want to deal with importing modules, opening file objects, using dictionaries, methods, functions, etc.

Why do that when I can literally just ``` N=0 while read -r URL; do curl "$URL" | jq '.data[].someProp' | grep -v "filter" > "$N.data" N="$((N+1))" done < ./links.txt ```

The other thing is bash makes it exceptionally easier to integrate across different system tools. Need to grab something from a remote with `rsync`, edit some exif, upload it to a CDN? That's 3 commands in a row, versus god knows what libraries, objects, and methods you need to deal with in Python.

3 comments

skydhash 703 days ago

Libraries are nice, until you have to write the glue code between the modules and functions. But sometimes you already have the features you want as programs and you just need to do some basic manipulation with their arguments and outputs. And the string model can work well in that case.

link

wiseowise 703 days ago

> Why do that when I can literally just ``` N=0 while read -r URL; do curl "$URL" | jq '.data[].someProp' | grep -v "filter" > "$N.data" N="$((N+1))" done < ./links.txt ```

So that other people can read and modify this.

link

pxc 701 days ago

? That code's meaning is extremely clear: it reads from a list of URLs that return JSON lists of JSON objects. For each URL, it pulls out some property and checks whether each line does not contain the string 'filter'. Those lines which clear the 'filter'-filter are written to a file whose name is the line of the original input file which contained the URL pinged suffixed by the extension '.data'.

It's very easy to read and modify if you just write it out longwise, which is what I'd always do in some actual script. (I also like to put reading data at the beginning of the pipeline, so I'd use a useless use of cat here.) To illustrate:

  N=0
  cat links.txt \
    | while read -r URL; do
        curl "$URL" \
          | jq '.data[].someProp' \
          | grep -v "filter" \
          > "$N.data"
        N="$((N+1))"
      done

It's a very simple pipeline with a single loop. It's not very different from pipelines you might write to transform some data using a thread macro in Clojure, or method chaining against a collection or stream in OOP languages like Java or Scala or Ruby or whatever you like.

link

qz_kb 703 days ago

now add error handling.

link

ryapric 703 days ago

That's really not that hard to add above. A lot of folks act like it's impossible to handle errors etc. in bash, but it's pretty straightforward -- certainly no more difficult than in any other language. The hard part, like with all languages, is deciding how to handle errors cases. The rest is just code.

link

wiseowise 703 days ago

> That's really not that hard to add above.

Then show how it is going to look like?

link

ryapric 703 days ago

On mobile so no idea if this a) looks good or b) runs (especially considering the command substitutions, but you could also redirect to temp files instead), but it's just something like this:

    N=0
    while read -r URL; do
      data="$(curl "$URL")" || { printf 'error fetching data\n' && exit 1 ; }
      prop="$(jq '.data[].someProp' <<< "$data" || { printf 'error parsing JSON response\n' && exit 1 ; }
      grep -v "filter" <<< "$prop" > "$N.data" || { printf 'error searching for filter text\n' && exit 1 ; }
      N="$((N+1))"
    done < ./links.txt

bash also has e.g. functions, so you could abstract that error handling out. Like I said, not that weird.

Edit: oh good, at least the formatting looks ok.

link

theamk 702 days ago

You forgot "-f" flag to curl, which means it won't fail if the server returns an error. Also "jq" returns success on empty input, pretty much always. Together, this might mean that networking errors will be completely ignored, and some of your data files will mistakenly become empty when server is busy. Good luck debugging that!

And yes, you can fix those pretty easily.. as long as you aware about them. But this is a great illustration why you have to be careful with bash: it's a 6-line program which already has 2 bugs which can cause data corruption. I fully agree with the other commenter: switch to python if you have any sort of complex code!

link

1-more 702 days ago

plus the delightful mechanism of

    set -euo pipefail # stop execution if anything fails
    cleanup () {
        rm $tmp_file
        popd
        # anything else you want to think of here to get back to the state of the
        # world before you ran the script
    }
    trap cleanup EXIT # no matter how the program exits, run that cleanup function.

A really good rundown of the different options for set https://www.howtogeek.com/782514/how-to-use-set-and-pipefail...

link

Too 702 days ago

But why bother? The moment you start doing all that, all the arguments of "oh look how much i can solve with my cool oneliner" goes away. The python version of that code is not only safe by default, it's also shorter and actually readable. Finally it is a lot more malleable, in case you need to process the data further in the middle of it.

    for N, url in enumerate(Path("links").read_text().splitlines()):
        resp = requests.get(url)
        resp.raise_for_status()
        prop = resp.json()["data"]["someProp"]
        matches = (line for line in prop.splitlines() if "filter" not in line)
        Path(f"{N}.data").write_text("\n".join(matches))

I'm sure there is something about the jq [] operator i am missing but whatever. An iteration there would be a contrived use case and the difficulty to understand it on a glance just proves I'm not interested. As someone else mentioned, both curl and jq requires some extra flags to not ignore errors, i can't say if that was intentional or not. It would either way be equally easy to solve.

link

ryapric 701 days ago

I never said anything about one-liners?

link

cassianoleal 702 days ago

    set -euo pipefail; N=0 while read -r URL; do curl "$URL" | jq '.data[].someProp' | grep -v "filter" > "$N.data" N="$((N+1))" done < ./links.txt

link