| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by EdSchouten 1636 days ago

I remember that when I worked at Google about a decade ago, there was this common saying:

"If the first version of your shell script is more than five lines long, you should have written it in Python."

I think there's a lot of truth in that. None of the examples presented in the article look better than had they been written in some existing scripting/programming language. In fact, had they been written in Python or Javascript, it would have been far more obvious what the resulting output would have been, considering that those languages already use {} for objects and [] for lists.

For example, take this example:

    jo -p name=JP object=$(jo fruit=Orange point=$(jo x=10 y=20) number=17) sunday=false

In Python you would write it like this:

    json.dumps({"name": "JP", "object": {"fruit": "Orange", "point": {"x": 10, "y": 20}, "number": 17}, "sunday": False})

Only a bit more code, but at least it won't suffer from the Norway problem. Even though Python isn't the fastest language out there, it's likely still faster than the shell command above. There is no need to construct a fork bomb just to generate some JSON.

14 comments

zimpenfish 1636 days ago

> Even though Python isn't the fastest language out there, it's likely still faster than the shell command above.

Taking these two command lines:

   jo -p name=JP object=$(jo fruit=Orange point=$(jo x=10 y=20) number=17) sunday=false >/dev/null

   python -c 'import json;print(json.dumps({"name": "JP", "object": {"fruit": "Orange", "point": {"x": 10, "y": 20}, "number": 17}, "sunday": False}))' >/dev/null

For jo (x86_64, Rosetta2), python2 (x86_64, Rosetta2), jo (arm64), and python3 (arm64), running 1000 iterations, with `tai64n` doing the timing.

    2022-02-05 21:25:38.357228500 start-jo-x86
    2022-02-05 21:25:45.319337500 stop-jo
    2022-02-05 21:25:45.319338500 start-python2-x86
    2022-02-05 21:26:18.876235500 stop-python2-x86
    2022-02-05 21:26:18.876235500 start-jo-arm
    2022-02-05 21:26:22.316063500 stop-jo-arm
    2022-02-05 21:26:22.316064500 start-python3-arm
    2022-02-05 21:26:40.379063500 stop-python3-arm

I make it: 7s for jo-x86, 33.5s for python2-x86, 3.5s for jo-arm, 18s for python3-arm.

Test script is at https://pastebin.com/4tTVrDia

nousermane 1636 days ago

python3 is (relatively) slow to startup, and this is something that got significantly worse with 2->3 migration:

  $ time python3 -c ''
  real    0m0.029s

  $ time python2 -c ''
  real    0m0.010s

  $ time bash -c ''
  real    0m0.001s

Which means - you probably don't want to have python scripts on a busy webserver, being called from classic cgi-bin (do people still use those?), or run it as -exec argument to a "find" iterating over many thousands files. Maybe a couple more of such examples. For most use-cases though, that's still fast enough.

benibela 1635 days ago

I get 14,823s for python3 and 4,667s for jo on my system.

I also wrote my own tool, xidel [1]:

    time for i in $(seq 1 $count); do xidel -se '{"name": "JP", "object": {"fruit": "Orange", "point": {"x": 10, "y": 20}, "number": 17}, "sunday": false()}' > /dev/null; done

which gives me 1,575s

But if you actually want to repeat something a thousand times, you would use a loop in the query for 0,017s:

    time xidel -se 'for $i in 1 to 1000 return {"name": "JP", "object": {"fruit": "Orange", "point": {"x": 10, "y": 20}, "number": 17}, "sunday": false()}'  > /dev/null

(a python3 loop gives me 0,029s)

[1] https://videlibri.de/xidel.html

killingtime74 1636 days ago

what about how long for a human to read it and debug it when it gets beyond trivial?

zimpenfish 1635 days ago

Fair question. I think jo does tend to get more crufty if you're doing anything reasonably complex with multilevel structures, especially with arrays.

But jo does come into its own when you're wanting to use shell variables.

    > jo mypid=$$ set_or_not=$WEASEL

    > python -c 'import json,os;print(json.dumps({"set_or_not":os.getenv("WEASEL"), "mypid":os.getpid()}))'

bb88 1636 days ago

I'm not disagreeing that python is slow, but why would you choose to do either in a shell script?

    $ time cat<<EOL
    {"name": "JP", "object": {"fruit": "Orange", "point": {"x": 10, "y": 20}, "number": 17}, 
    "sunday": false}  
    EOL
    {"name": "JP", "object": {"fruit": "Orange", "point": {"x": 10, "y": 20}, "number": 17}, 
    "sunday": false}

    real 0m0.002s
    user 0m0.000s
    sys  0m0.002s

zimpenfish 1635 days ago

> why would you choose to do either in a shell script?

In the normal case, you'd have variables interpolated in there, not static JSON. And then you run into the quoting problems that jo was created to work around...

EdSchouten 1636 days ago

Now put a thousand of those JSON objects in a list, invoking jo for every element.

zimpenfish 1636 days ago

That wasn't the claim made in the original post though, was it? The claim was that the Python snippet would be quicker than the jo snippet.

"Even though Python isn't the fastest language out there, it's likely still faster than the shell command above."

Which is most definitely is not - it's 5x slower.

(Probably not a huge issue in the real world if you're writing a shell script, mind, given that bash itself isn't a performance demon. But claims have to be tested.)

EdSchouten 1636 days ago

That’s because you’re making a false assumption about the environment prior to executing the statement.

If you are in a shell session and have to choose between executing python -c or calling jo, the latter is faster as you’ve demonstrated. But that’s not a realistic assumption.

Statements like these are almost certainly part of some combined work. The data you’re feeding to jo comes from somewhere. Its output is written somewhere.

You can’t convince me that if you’re already inside some Python script, that invoking json.dumps() is slower than calling jo from within a shell script.

At no point did I claim that launching Python AND running that json.dumps() is faster than running that shell command. I only stated that the json.dumps() is.

zimpenfish 1635 days ago

> if you’re already inside some Python script [...]

You're not going to shell out to `jo` and that's fine - it's not what `jo` was created for; it's explicitly a shell command to help you work around the annoyance of getting quoting right when constructing JSON from the command line (which I've had to do a lot and I'm pretty sure many people have to.)

> If you are in a shell session [and want to create JSON] ... that’s not a realistic assumption.

Of course it is. People create JSON in shell scripts all the time! That's why things like `jq` exist - because this is what people do!

KronisLV 1636 days ago

I actually did that for a more realistic comparison.

Example for jo:

  docker run --rm -it debian bash
  apt update && apt install -y jo nano
  nano bash-loop.sh && chmod +x bash-loop.sh
  
  #!/bin/bash
  for ((i=0;i<1000;i++)); 
  do 
     jo -p name=JP object=$(jo fruit=Orange point=$(jo x=10 y=20) number=17) sunday=false
  done
  
  time ./bash-loop.sh >/dev/null

Example for Python 3:

  docker run --rm -it debian bash
  apt update && apt install -y python3 nano
  nano python-loop.py
  
  import json
  for i in range(1000):
    print(json.dumps({"name": "JP", "object": {"fruit": "Orange", "point": {"x": 10, "y": 20}, "number": 17}, "sunday": False}))
  
  time python3 python-loop.py >/dev/null

Versions:

  Debian GNU/Linux 11 (bullseye)
  jo 1.3
  Python 3.9.2

Results for jo:

  real    0m2.230s
  user    0m1.106s
  sys     0m1.076s

Results for Python 3:

  real    0m0.027s
  user    0m0.021s
  sys     0m0.005s

So it seems like you're probably right about how individual invocations scale for larger amounts of invocations in non-trivial cases!

Note: jo seems to pretty print because of the "-p" parameter, which is not the case with Python, might not be a 1:1 comparison in this case. Would be better to remove it. Though when i did that, the performance improvement was maybe 1%, not significant.

Admittedly, it would be nice to test with actually random data to make sure that nothing gets optimized away, such as just replacing one of the numbers in JSON with a random value, say, the UNIX timestamp. But then you'd have to prepare all of the data beforehand (to avoid differences due to using Python to get those timestamps, or one of the GNU tools), or time the execution separately however you wish.

Edit to explain my rationale: Why bother doing this? Because i disagree with the sibling comment:

> The claim was that the Python snippet would be quicker than the jo snippet.

In my eyes that's almost meaningless, since in practice when you'll actually care about the runtimes will be when working with larger amounts of data, or alternatively really large files. Therefore this should be tested, not just the startup times, which become irrelevant in most real world programs, except for cases when you'd make a separate invocation per request, which you sometimes shouldn't do.

Edit #2: here's a lazy edit that uses the UNIX time and makes the data more dynamic, ignoring the overhead to retrieve this value, to get a ballpark figure.

Use time value for jo:

  jo -p name=JP object=$(jo fruit=Orange point=$(jo x=10 y=20) number=$(date +%s)) sunday=false

Use time value for Python 3:

  import time
  ...
  print(json.dumps({"name": "JP", "object": {"fruit": "Orange", "point": {"x": 10, "y": 20}, "number": int(time.time())}, "sunday": False}))

Results for jo:

  real    0m2.794s
  user    0m1.422s
  sys     0m1.313s

Results for Python 3:

  real    0m0.027s
  user    0m0.020s
  sys     0m0.006s

Seems like nothing changed much.

Edit #3: probably should have started with a test to verify whether the initially observed performance differences (Python being slower due to startup time) were also present.

Single iteration results for jo:

  real    0m0.003s
  user    0m0.000s
  sys     0m0.002s

Single iteration results for Python 3:

  real    0m0.022s
  user    0m0.017s
  sys     0m0.004s

Seems to also more or less match those results.

ZeroGravitas 1636 days ago

Often when writing scripts, I'm chaining tools together, e.g. using git to find a thing in a specific commit, using curl to grab something from the web, decoding some json, maybe unzipping a file.

I've never really found any language that feels good for that kind of thing, there's definately a middle ground where it's getting too much for bash, but jumping to a language loses too much at the initial conversion to make it feel worth it until you are well past the point where your future self will think you should have made the switch.

Some languages have things like backticks in php to inter-operate but it's still not great experience to mix between them. For my own little things I'm currently looking at fish, but bash is omnipresent.

This tool seems to delay that point even further, as currently dealing with generating json is definately a pain point (whereas manipulating it in jq is often really good).

But if anyone can point to good examples of this transition in python then I'd be very interested.

edit: jq is more powerful than I thought for creating json, see https://spin.atomicobject.com/2021/06/08/jq-creating-updatin...

bb88 1636 days ago

Typically if you have bash you have a bunch of other utilities installed too.

The problem with python here is that while python-sh might be nice, you have to install any extra libraries you need with it, and that's not a trivial problem for installing scripts into prod.

Xonsh is better since you kind of get both the benefits of python and a shell like language, but frankly it's broken in a number of ways still. I use it daily, but hopefully you don't need to ctrl-c out of something since signal handling is iffy at best. It is kind of nice to be able to import python directly on the command line and use it as python though...

Corrado 1635 days ago

Thank you for pointing out that jq can create JSON! I use jq all the time for working with the AWS CLI and a big pain point has always been sending JSON. If you don't know, the AWS CLI depends on JSON arguments for quite a few common tasks, and the JSON needed can be quite lengthy.

Up until now I've been creating temp JSON files to feed the commands and I thought jo would be a great tool to make this easier. Now that I know jq can also create JSON, I'll just use that instead.

Galanwe 1636 days ago

> there was this common saying:

> "If the first version of your shell script is more than five lines long, you should have written it in Python."

Seriously, these kind of "common knowledge", "universal truth in a sentence" sayings are often the mark of wannabe guru mid career engineers that have no clue what they are talking about.

jsnelgro 1636 days ago

This is a toxic comment that adds nothing to the conversation.

goldenkey 1636 days ago

I don't know, the jo command seems a lot more readable to me than compressing json to a one line string. How is it toxic to call out a silly truthism?

MathMonkeyMan 1635 days ago

I agree with Galanwe, but

> [...] are often the mark of wannabe guru mid career engineers that have no clue what they are talking about.

is the "toxic" part.

At least he says "are often," as not to accuse the anonymous mid career Google engineer.

1vuio0pswjnm7 1636 days ago

"Even though Python isn't the fastest language out there, it's likely still faster than the shell command above."

That is going a bit far. By all means use Python. Go ahead and attack people who use the shell. But let's be honest. The shell is faster, assuming one knows how to use it. A similar claim is often made by Python advocates, something along the lines of Python is not slow if one knows how to use it.

The startup time of a Python interpreter is enormous for someone who is used to a Bourne shell. This is what always stops me from using Python as a shell replacement for relatively simple jobs; I have never written a large shell script and doubt I ever will. I write small scripts.

If anyone knows how to mitigate the Python startup delay, feel free to share. I might become more interested in Python.

Anyway, this "jo" thing seems a bit silly. Someone at Google spent their 20% time writing a language called jsonnet to emit JSON. It has been discussed on HN before. People have suggested dhall is a perhaps better alternative.

https://jsonnet.org

https://dhall-lang.org

michaelcampbell 1636 days ago

> it's likely still faster than the shell command above.

That's not a shell command any more than running "python" is. `jo` is its own executable.

And doing the sub-commands are not necessary; `jo` supports nested data natively.

jsnelgro 1636 days ago

I just assume at this point that a new python script from a coworker won’t run without an hour of tinkering and yelling obscenities at my screen. Or resorting to running in docker, which seems asinine. Python’s everywhere and does everything though, so I don’t have a good alternative. Shell scripts definitely aren’t it, but they generally hold up better when sharing in my experience.

Too 1636 days ago

Shell scripts can't declare dependencies though, in the same way a pip package can. A shell script using this tool requires one to manually apt install it first, or run in a common docker image - asinine. If you don't, your script will fail halfway through during runtime (actually, likely it will not fail, just produce corrupted output, since shell by default ignores errors), a python IDE or mypy will tell you about missing packages during analysis before you try to build and run it.

Besides that, looking at json only, it's part of the standard library so is more likely to already exist on any given machine rather than this.

c0npr 1636 days ago

That depends on how well your bash script is constructed. If you carefully handle the falling case such as missing commands, non-root permissions, etc. It can be easy to use and kind of portable. Of course python scripts have better error trace so if the script doesnt work others can debug with relatively easily.

pkulak 1636 days ago

Sure, but using Python gives you 100 more problems. Would you like to set pip and your package manager on a fight to the death? Or is today the day you learn all about venv? Might as well use Docker. Oh nice, my 600MB script is ready!

detaro 1636 days ago

That's not really an issue for small scripts, no.

deathanatos 1636 days ago

Since I have no idea why this is downvoted: it really isn't for small scripts: if you just ignore pip/packages, Python is going to give you way more functionality out of the box than shell would, with a lot fewer sharp corners that will translate into a less buggy/more correct script at the end of day.

If you do take the time to deal with pip (which, yes, is a problem) you get access to even more batteries that would have been a pain or just flat out impossible with shell.

(& on many distros, you can use your system package manager. I'm not seeing a material difference between a shell script that requires "apt-get install jo" and a Python script that requires "apt-get install python3-requests" or something.)

But either way, for circumstances where shell is the wrong tool for the job, "invoke python in the middle of this shell script because it's a better tool for this particular part of the job" is a strategy I've used before & will keep using, b/c it produces code that isn't riddled with bugs.

bb88 1636 days ago

It's behavior is really well defined and will stop on a syntax error, type error, or some other error that wasn't handled.

Bash on error will just move on to the next line as if nothing ever happened unless you:

    set -eu
    set -o pipefail

skybrian 1636 days ago

The longest shell script I ever wrote was at Google, because at the time, my co-worker and I didn't know anyone with Python readability.

jasfi 1636 days ago

Although good advice, this is also an area where Nim could shine.

mmgutz 1636 days ago

If you're going to use string literal, then

  echo '{"name": "JP", "object": {"fruit": "Orange", "point": {"x": 10, "y": 20}, "number": 17}, "sunday": false}'

is fine as a script

anamexis 1636 days ago

All fun and games until you need quoting. Quoting in shell scripts is already hellish enough, but layering JSON quoting on top of that is a road to madness.

mkdirp 1636 days ago

Combining here-string with `jq` (if you need variables), is plenty good enough. You'd need a single jq invocation.

anamexis 1636 days ago

Yes, `jq` would work fine, but so would `jo`. The point is, if there is anything more than the simplest dynamic values, constructing valid JSON just with POSIX shell is a huge pain.

nfw2 1636 days ago

I feel this whenever anyone advocates using jq. It boggles my mind that anyone would want to learn a whole new DSL for something that js makes trivial, especially considering js a much more expressive scripting language anyway

viraptor 1636 days ago

JS is not that easy to embed in scripting though for trivial situations. After seeing a couple of jq examples I can run `jq '.foo[] | {bar, baz}' without really "learning" its DSL. But doing the same with node? That would be much larger.

nfw2 1636 days ago

And now everyone who needs to read your script needs to also find those examples to figure out what your code does. Even your not-real-world example isn't obvious to anyone unfamiliar with jq what the intent is

viraptor 1636 days ago

That's about the only example you need explained to understand ~99% of real world jq usage. $dayjob has quite a bit of it around in various repos, and this actually is as real-world as it gets in my experience. Comparing that to having to learn enough JS to do the same thing in a verbose way, I'm still on the side of jq having an advantage in that case.

naniwaduni 1636 days ago

For programs this trivial, startup time so dominates runtime, and Python's startup time is so incredibly awful, that you can often fork 10-20 low-overhead processes before Python even starts executing user code.

motoboi 1636 days ago

Hard to write oneliners in python, though.

And this lets you write json without (or with less) braces, commas and quotes. That alone is already a big win.

synergy20 1636 days ago

that's probably you're at google, however there are more embedded devices on earth than whatever google has times 1+ billion. in those devices python is too heavy, and a posix shell along with jo fits perfectly.