Hacker News new | ask | show | jobs
by visarga 1621 days ago
JQ syntax feels too unusual, doesn't resemble known code, gives me the feeling of looking into cryptic Perl or regex, could never remember the simplest things.

For example how would you take key k1 from a list of dicts [{k1: v1, k2: v2}, {k1: v3}]?

8 comments

> For example how would you take key k1 from a list of dicts [{k1: v1, k2: v2}, {k1: v3}]?

Do you mean something like:

    .[].k1
Give it a try.

https://jqplay.org/

jq does have a learning curve, but just like any query language, including SQL, first you need to learn the basics of the query language in order to get things to work.

In this case:

* you know that .[] iterates over objects, so you use it to unpack the root array,

* you know you get a stream of objects, thus from those you use the .k1 filter to get the values of each k1 key.

Here's jq's manual on basic filters: https://stedolan.github.io/jq/manual/#Basicfilters

After you get jq to filter out what you want, you can work on getting it to output results in whatever format you wish.

Well for comparison, I did that learning curve process with SQL and I was able to understand it. But I did the same learning curve process with JQ and I still don’t understand it.
> jq does have a learning curve, but just like any query language, including SQL, first you need to learn the basics of the query language in order to get things to work.

SQL is based on solid mathematical theory, relational algebra. I personally learned that (and tuple relational calculus) in college before learning SQL, which made it easier. It helps making it coherant. Is there something like this for jq? Often when people invent languages that are not based on solid theory, they tend to lack coherence. This can make learning them difficult if you're someone that relies on your mental model of how things "should" work, like I am.

> Is there something like this for jq?

It's a filter. You can name-drop math stuff and even mention monads and the like, but it's just predicates, maps, a reductions.

Also, I'm not aware of a single person who ever looked at relational algebra beyond the introductory lessons of a relational databases 101 course, and even then that stuff was mostly in the way.

Your are the one name-dropping three mathematical concepts though? I'm not sure I understand your reasoning here. I'm talking about the basis for jq in general, not just your example. And if your message is representative of how the people that created jq think, I guess the answer is no and jq falls into the "no solid theory behind it" category.

I don't think my message was implying that jq is a worse (or better) tool for it. I was just explaining that for some people, tools with a theory behind are easier to learn and understand than tools without.

I agree that jq's query language is very obtuse and probably my biggest barrier towards learning it. I have found great mileage using gron [1], which is very different from jq, but its goal is to promote exploration of a JSON file through common unix tools such as awk and grep.

1: https://github.com/tomnomnom/gron

> I have found great mileage using gron

`gron` is great but doesn't seem to handle some (extreme-ish) situations that `jq` can, e.g. the json output from the fastnbt-tools. You either get a `token too long` error using `gron -s` because the input is too long (it's 90MB, that's fair) or you get only one set of outputs per key (iyswim) because they get overlapped in memory.

> or you get only one set of outputs per key (iyswim) because they get overlapped in memory

That sounds like a major bug. So it will silently skip data that you wanted?

> That sounds like a major bug.

It's definitely an oddness when you have multiple objects at the same level that aren't in an array but I guess the explanation there is "they should all be on their own individual lines as streaming json" which `gron` does handle correctly.

    (echo '{"a":"23"}'; echo '{"a":"25"}') | gron -s
    json = [];
    json[0] = {};
    json[0].a = "23";
    json[1] = {};
    json[1].a = "25";
> So it will silently skip data that you wanted?

Yeah.

    echo '{"a":"23"}{"a":"25"}' | gron
    json = {};
    json.a = "23";
The `-s` option doesn't help.

    echo '{"a":"23"}{"a":"25"}' | gron -s
    json = [];
    json[0] = {};
    json[0].a = "23";
Had a look at the source and I think I've figured out why and maybe how to fix it. Will have a bash at making a PR this week.
I want to vouch for gron as well. Apart from being grepable, I found it is easier to orient myself where I am in a very large JSON structure. The location in the hierarchy is present on every single line, no need to scroll up or down to figure it out. Granted, many other tools can help with this as well, but gron does it well.
Love this. Such a simple idea yet very helpful. It probably can't do what all jq does but it will solve most of what you usually want to do with json on the command line.

Thanks for that tip!

I've struggled with the jq language when doing complicated things, but generally felt it was just the problem that was tricky. Generally I feel like I'm learning an actual useful language, though I guess Perl, Regex fall into that same category, what seems impenetreble at first later becomes almost second nature as you use $ to mean end of line in vi and so on. Then if you don't do it for a while, you forget the more obscure bits.

My approach to the example would be to use `.[] | .k1` which I think does what you want, and like bash command line pipes, you can build up to it semi-interactively.

The bits I struggle with JQ often involve irregular json, where a value might be missing, or null, or a list, not sure what the idiomatic way to deal with that is if there is one.

Had the same experience. That's why I've written jql[0], which puts a uniform lispy spin on CLI JSON processing. I now use it almost exclusively instead of jq. Check it out if you're looking for alternatives.

And by the way, you can achieve live preview with any of these CLI tools by using fzf. This is the snippet for jql for example: `echo '' | fzf --print-query --preview-window wrap --preview 'cat test.json | jql {q}'` (substitute jql for jq or anything else)

P.S.: jql might seem dead, as there are no recent commits, but it's not. It's just finished.

[0]: https://github.com/cube2222/jql

`jql` looks interesting - is there an easy way to do the equivalent of `jq`'s `to_entries[]`? (e.g. turns `{"x":"y"}{"a":"b"}` into `{"key":"x","value":"y"}{"key":"a","value":"b"}` which I've needed a lot recently for dealing with output with unknown keys.)
For the general case of multiple keys and values - no. It sounds reasonable, though, so I'll think about whether to add an entries function or a map function that would allow doing this in a simple way.

For the special case you wrote as an example, where each object is just a single key-value, it's possible:

  (object
      "key" (pipe (keys) (0))
      "value" (pipe ((keys)) (0)))
> I'll think about whether to add an entries function or a map function that would allow doing this in a simple way.

That would be super, ta. `to_entries[]` is pretty much the major reason I've not managed to move off `jq` to anything else yet because it's just incredibly powerful in this situation.

I've actually just gone ahead and added a way to do this - the zip function - in the v0.2.0 release.

The relevant jql snippet to solve this in the general case now is:

  (pipe
    (zip
      (keys)
      ((keys)))
    ((keys)
      (object
        "key" (0)
        "value" (1))))
It's not as terse as the jq equivalent - I'll probably add a way to create user-defined functions, so you can alias stuff like this to shorter forms - but that one will require more thought.
Wasn't expecting such a quick (if any!) response! Excellent, ta. That gives me the same output from my file as jq does with `to_entries[]`.

Unfortunately my next issue is how do I iterate over an array of objects (like jq `.[]`)? I'm guessing it's maybe something to do with `range` but I don't know how many I have in order to fill in those indices and I can't do `(elem 0) ... (elem 1)` for the same reason.

Does fzf let you specify autocomplete for the command based on the input? That would be amazing.
> For example how would you take key k1 from a list of dicts [{k1: v1, k2: v2}, {k1: v3}]?

    '.[].k1'

  ╰─$ echo '[{"k1": "v1", "k2": "v2"}, {"k1": "v3"}]' | jq -r '.[] | .k1'
  v1
  v3
https://codefaster.substack.com/p/mastering-jq-part-1-59c

1. parse a json value from stdin and set it as the initial result

2. for each function, apply the function to the result, and set the output as the result for the next function.

3. The final result is pretty printed on stdout.

Yeah I too found the syntax somewhat unintuitive at times. But to answer your question, you would do 'map(.k1)'
> gives me the feeling of looking into cryptic Perl or regex

Dunno if you'll see this given how many replies you already got, but rather than just dumping "how do you do that" here's a realization I had a while ago that made it way easier to understand:

jq's language is a series of filters/transformers more akin to bash pipes on a stream of data than anything else.

For example, just "." selects out the current object (and is needed to match the "root" at the start of the query), and jq pretty-prints the results (when to a terminal):

  $ echo '[{"k1": "v1", "k2": "v2"}, {"k1": "v3"}]'
  [{"k1": "v1", "k2": "v2"}, {"k1": "v3"}]
  $ echo '[{"k1": "v1", "k2": "v2"}, {"k1": "v3"}]' | jq '.'
  [
    {
      "k1": "v1",
      "k2": "v2"
    },
    {
      "k1": "v3"
    }
  ]
There's only 1 matching element here, the outermost array. We want to go one deeper, so use "[]" to unwrap/flatten it:

  $ echo '[{"k1": "v1", "k2": "v2"}, {"k1": "v3"}]' | jq '.[]'
  {
    "k1": "v1",
    "k2": "v2"
  }
  {
    "k1": "v3"
  }
jq is now iterating over 2 objects, so the next filter is the one where you select out the key you want. This can be done in two different ways for this example (per sibling replies):

  $ echo '[{"k1": "v1", "k2": "v2"}, {"k1": "v3"}]' | jq '.[].k1'
  "v1"
  "v3"
  $ echo '[{"k1": "v1", "k2": "v2"}, {"k1": "v3"}]' | jq '.[] | .k1'
  "v1"
  "v3"
Note how I broke these up: The atoms are ".", "[]", and ".k1" - ".[]" isn't one of them, despite what it may look like at first glance when compared to ".k1". Some additional examples to show how these combine:

The "unwrap/flatten" [] can be used multiple times when nested arrays are involved, with or without the pipe syntax, but only works on arrays. It errors if given something else:

  $ echo '[[1,2,3],[4,[5,6]]]' | jq '.[]'
  [
    1,
    2,
    3
  ]
  [
    4,
    [
      5,
      6
    ]
  ]
  $ echo '[[1,2,3],[4,[5,6]]]' | jq '.[][]'
  1
  2
  3
  4
  [
    5,
    6
  ]
  $ echo '[[1,2,3],[4,[5,6]]]' | jq '.[][][]'
  jq: error (at <stdin>:1): Cannot iterate over number (1)

  $ echo '[[1,2,3],[4,[5,6]]]' | jq '.[] | .[]'
  1
  2
  3
  4
  [
    5,
    6
  ]
  $ echo '[[1,2,3],[4,[5,6]]]' | jq '.[] | .[] | .[]'
  jq: error (at <stdin>:1): Cannot iterate over number (1)
Also notice how the "." is needed after the pipes; these are separate filters/transformations being chained together, so as a new rule it needs the same "." as with the first one.