| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by xg15 1521 days ago

So, admitted jq fanboy here, but I found a lot of the criticism from the articale really sensible.

I think jq has a pretty elegant data model, but the syntax is often very clunky to work with.

So here is a half thought-out idea how you might improve the syntax for the "stateful operations" usecase the OP outlined:

I think it's not quite true that different elements of a sequence can never interact. The OP mentioned reduce/foreach, but it's also what any function that takes argument does:

If you have an expression 'foo | bar', then bar is called once for every element foo emits. However, foo could also a function that takes arguments. Then you can specify bar as an argument of foo like this: 'foo(bar)'. In this situation, execution of bar is completely controlled by foo. In particular, foo gets to see all elements that foo emits, not just one each. I believe this is how e.g. [x] can collect all elements of x into an array.

In the same way, you could write a function 'add_all(x)' which calls x and adds up all emitted elements to a sum.

However, this wouldn't help you with collecting all input lines, as there is nothing for you function to "wrap around". Or at least, there used to be nothing, but I think in one of the recent build, an "inputs" function was added, which emits all remaining inputs. So now, you can write e.g. '[., inputs]' to reimplement slurp. In the same way, you could sum up all input lines by writing 'add_all(., inputs)'.

However, this is still ugly and unintuitive to write, so I think introducting some syntactic sugar for this would be useful. E.g., you could imagine a "collect operator", e.g. '>>' which treats everything left of it as the first argument to the function to the right of it.

e.g., writing 'a >> b' would desugar to 'b(a)'.

Writing 'a | b >> c' would desugar to 'c(a | b)'.

Any steps further to the right are not affected:

'a | b >> c | d' would desugar to 'c(a | b) | d'.

Scope to the left could be controlled with parantheses:

'a | (b >> c)' would desugar to 'a | c(b)'.

To make this more useful for aggregating on input lines, you could add a special rule that, if the operator is used with no parantheses, it will implicitly prepend '(., inputs)' as the first step.

So if the entire top-level expression is 'a | b >> c', it would desugar to 'c((., inputs) | a | b)'.

This would make many usecases that require keeping state much more straight-forward. E.g. collecting all the "baz" fields into an array could be written as '.baz >> []' which would desugar to '[(., inputs) | .baz]'

Summing up all the bazzes could be written as '.baz >> add_all' which would desugar to 'add_all((., inputs) | .baz)'

...and so on.

On the other hand, this could also lead to new confusion, as you could also write stuff like '... | (.baz >> map) | ...' which would really mean 'map(.baz)' or 'foo >> bar >> baz' which would desugar to the extremely cryptic expression 'baz((., inputs) | bar((., inputs) | foo))'. So I'm not quite sure.

Any thoughts about the idea?

1 comments

spiralx 1520 days ago

My thought is that having | as the pipe operator in the shell means that it's okay to use | for the or operator in your query syntax as the people using your XGQ will be familiar with that usage, but to then introduce a pipe operator (which >> pretty much is) alongside using | as something that isn't a pipe is just going to fuck with your muscle memory for reading code. Especially as >> is already the right-shift operator, and so you expect the precedence for 'a | b >> c' to mean 'a | c(b)'.

The pipe operator that's in its final stages of approval for JavaScript uses '|>' as its sigil, which is a decent compromise between not conflicting with existing operators, being compatible with developers' existing pattern matching, and representing what it does somewhat. And 'a | b |> c' is ok.

You could just have '@c a | b' mean "everything from @c onwards to the end of the string is the argument to c" i.e. 'c(a | b)' and have 'c a | b' be 'c(a) | b', then anything more complicated just requires using the parentheses operator to enclose an expression i.e. 'c (a | b)' or just 'c(a | b)' if your tokenizer is a bit cleverer :) Actually I like this idea, because '@' is syntactic sugar for () around the rest of the query, and a function then operates on the value of the expression following it.