Hacker News new | ask | show | jobs
by cryptonector 1023 days ago
Python is so much harder to process JSON data in than jq, that that is how I got into working with and on jq almost a decade ago.
1 comments

Yeah, Python is like 10-20x the number of lines required to do the same thing as jq (especially with the boilerplate of consuming stdin), but that's also why it's more readable. But generally I agree - I would choose jq over some weird bash/python hybrid most of the time. I just wish it was more immediately readable.
Simple jq programs are easy to read because simple jq programs are just path expressions, and the jq language is optimized to make path expressions easy to read. Path expressions like

  .[].commit | select(.author == "Tom Hudson")
which basically says "find all commits by Tom Hudson" in the input.

`.[]` iterates all the values in its input (whether the input be an array or an object). `.commit` gets the value of the "commit" key in the input object. You concatenate path expressions with `|`, and array/object index expressions you can just concatenate w/o `|`, so `.[]` and `.commit` can be `.[] | .commit` and also `.[].commit`. Calls to functions like `select()` whose bodies are path expressions are.. also path expressions.

Perhaps the most brilliant thing about jq is that you can assign to arbitrarily complex path expressions, so you can:

  (.[].commit | select(.author == "Tom Hudson")) = "Anon"
The syntax is strange probably because of this trying to make path expressions so trivial and readable.

jq programs get hard to read mainly when you go beyond path expressions, especially when you start doing reductions. The problem is that it resembles point free programming in Haskell, which is really not for everyone.

The other thing is that jq is very much a functional programming language, and that takes getting used to.

Also, here’s something that seems not widely appreciated: You can write super clever unreadable one-long-line jq programs embedded in bash scripts (I hear you on the point-free thing), or you can write jq programs that live in their own files, with multiple lines, indentation, comments, and intermediate assignments to variables with readable names. I recommend the latter!
Wait, really? I had no idea you could do that. Might have to try that next time Im tempted to break out python or node for a bash script.
I found a random example on GitHub for you. Search `path:jq$` for more.

https://github.com/flox/flox/blob/019095f8bc40e49abc8e5cd0b1...

data = json.load(sys.stdin)

commits = [elt.commit for elt in data if elt.commit.author = "Tom Hudson"]

json.dump(commits, sys.stdout)

Definitely not as straightforward... would be nice to have a bit more affordances for path expressions in Python.

That doesn’t quite work, because JSON objects are parsed to Python dicts, not Python objects with properties, so it would be:

  data = json.load(sys.stdin)
  commits = [
    e["commit"] 
    for e in data 
    if e["commit"]["author"] == "Tom Hudson"
  ]
  json.dump(commits, sys.stdout)
This also won't work since it'll crash on missing fields. e.get("commit", {}).get("author", "") maybe (ignoring the corner case of non-list top level object).
Which is pretty useful - I will get malformed JSON error as earlier as possible.

P.S. `some.get("A", {})["B"]` is bad programming habit because there might be a list on `some["A"]`

You can do it like this with Jello (I am the author):

    jello '[e.commit for e in _ if e.commit.author == "Tom Hudson"]'
Jello let’s you use python syntax with dot notation without the stdin/stdout/json.loads boilerplate.

https://github.com/kellyjonbrazil/jello

This is a non-problem solved by the jq example. Clearly nobody sane writes (or consumes) APIs which sometimes produce array of object, sometimes produce singular objects of the same shape... Or maybe I'm spoiled from using typed languages and cannot see the ingenuity of the python/javascript/other-untyped-hyped-lang api authors that it solves?
> Clearly nobody sane writes (or consumes) APIs which sometimes produce array of object, sometimes produce singular objects of the same shape...

Has nothing to do with arrays, it has to do with the fact that Python dicts with string indexes and Python objects with properties are different things, unlike JS where member and index access are just different ways of accessing object properties.

> Or maybe I'm spoiled from using typed languages and cannot see the ingenuity of the python/javascript/other-untyped-hyped-lang api authors that it solves?

This isn't an untyped thing, this is a JavaScript (and thus JSON) and Python have type systems (even if they usually don't statically declare them) and those type systems and thus the syntax around objects are different between the two.

Oops, yep totally. Even more futzy! Think if I was doing this a lot I'd totally pull out one of those "dict wrappers that allow for attr-based access" that lots of projects end up writing for whatever reason
jmespath is your friend for this

    import jmespath
    import json

    doc = json.load(sys.stdin)
    print(jmespath.search("[?commit.author == `Tom Hudson`].commit", doc))
I wish it had won over jq because JMESPath is a spec with multiple implementations and a test suite where jq is... well jq and languages have bindings not independent implementations.
`import jmespath` is a lot like importing jq...

> I wish it had won over jq because JMESPath is a spec with multiple implementations and a test suite where jq is... well jq and languages have bindings not independent implementations.

jq has multiple implementations too! In Go, Rust, Java, and... in jq itself.

So just picking Java https://github.com/eiiches/jackson-jq

> jackson-jq aims to be a compatible jq implementation. However, not every feature is available; some are intentionally omitted because thay are not relevant as a Java library; some may be incomplete, have bugs or are yet to be implemented.

Where JMESPath has fully compliant 1st party implementations in Python, Go, Lua, JS, PHP, Ruby, and Rust and fully compliant 3rd party implementations in C++, Java, .NET, Elixer, and TS.

Having a spec and a test suite means that a all valid JMESPath programs will work and work the same anywhere you use it. I think jq could get there but it doesn't seem to be the project's priority.

Repeating an identifier like this is inelegant, it should be (untested)

  commit|[?author == `Tom Hudson`]
jmespath does look like an interesting thing. Wish it weren't stringly-typed but that is a bit unavoidable.
I've found Ruby much nicer for writing dirty parsing logic like this in a "real" language, it lets you be more terse and "DRY" than Python. Which in bigger software projects doesn't hurt me as much but when I'm primarily trying to write something that otherwise would be well handled by SQL or JQ I found Ruby the better middleground for me.