Hacker News new | ask | show | jobs
by ryapric 696 days ago
On mobile so no idea if this a) looks good or b) runs (especially considering the command substitutions, but you could also redirect to temp files instead), but it's just something like this:

    N=0
    while read -r URL; do
      data="$(curl "$URL")" || { printf 'error fetching data\n' && exit 1 ; }
      prop="$(jq '.data[].someProp' <<< "$data" || { printf 'error parsing JSON response\n' && exit 1 ; }
      grep -v "filter" <<< "$prop" > "$N.data" || { printf 'error searching for filter text\n' && exit 1 ; }
      N="$((N+1))"
    done < ./links.txt
bash also has e.g. functions, so you could abstract that error handling out. Like I said, not that weird.

Edit: oh good, at least the formatting looks ok.

3 comments

You forgot "-f" flag to curl, which means it won't fail if the server returns an error. Also "jq" returns success on empty input, pretty much always. Together, this might mean that networking errors will be completely ignored, and some of your data files will mistakenly become empty when server is busy. Good luck debugging that!

And yes, you can fix those pretty easily.. as long as you aware about them. But this is a great illustration why you have to be careful with bash: it's a 6-line program which already has 2 bugs which can cause data corruption. I fully agree with the other commenter: switch to python if you have any sort of complex code!

plus the delightful mechanism of

    set -euo pipefail # stop execution if anything fails
    cleanup () {
        rm $tmp_file
        popd
        # anything else you want to think of here to get back to the state of the
        # world before you ran the script
    }
    trap cleanup EXIT # no matter how the program exits, run that cleanup function.
A really good rundown of the different options for set https://www.howtogeek.com/782514/how-to-use-set-and-pipefail...
But why bother? The moment you start doing all that, all the arguments of "oh look how much i can solve with my cool oneliner" goes away. The python version of that code is not only safe by default, it's also shorter and actually readable. Finally it is a lot more malleable, in case you need to process the data further in the middle of it.

    for N, url in enumerate(Path("links").read_text().splitlines()):
        resp = requests.get(url)
        resp.raise_for_status()
        prop = resp.json()["data"]["someProp"]
        matches = (line for line in prop.splitlines() if "filter" not in line)
        Path(f"{N}.data").write_text("\n".join(matches))
I'm sure there is something about the jq [] operator i am missing but whatever. An iteration there would be a contrived use case and the difficulty to understand it on a glance just proves I'm not interested. As someone else mentioned, both curl and jq requires some extra flags to not ignore errors, i can't say if that was intentional or not. It would either way be equally easy to solve.
I never said anything about one-liners?
There's real value in 'one-liners', though. A 'one-liner' represents a single pipeline; a single sequence of transformations on a stream of data— a single uninterrupted train of thought.

Chopping a script into a series of one-liners where every command in the pipeline but the first and/or last operate only on stdin and stdout, as far as is reasonable to do, is a great way to essentially write shell in an almost 'functional' style. It makes it easy to write scripts with minimal control flow and easy-to-locate effects.

Such scripts are ime generally much easier to read than most Python code, but I don't find Python especially easy to read.