| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mccanne 1519 days ago

Hi, all. Author here. Thanks for all the great feedback.

I've learned a lot from your comments and pointers.

The Zed project is broader than "a jq alternative" and my bad for trying out this initial positioning. I do know there are a lot of people out there who find jq really confusing, but it's clear if you become an expert, my arguments don't hold water.

We've had great feedback from many of our users who are really productive with the blend of search, analytics, and data discovery in the Zed language, and who find manipulating eclectic data in the ZNG format to be really easy.

Anyway, we'll write more about these other aspects of the Zed project in the coming weeks and months, and in the meantime, if you find any of this intriguing and want to kick the tires, feel free to hop on our slack with questions/feedback or file GitHub issues if you have ideas for improvements or find bugs.

Thanks a million!

https://github.com/brimdata/zed https://www.brimdata.io/join-slack/

4 comments

preferjq 1519 days ago

"cobbled-together" jq as it often appears in the wild will often compare badly with crafted solutions because the writer's goal is usually GSD and not write pretty code.

People with the time and inclination to slow down and think a little more about how the tools work will produce cleaner solutions.

In your example to convert

    {"name":"foo","vals":[1,2,3]}

    {"name":"foo","val":1}
    {"name":"foo","val":2}
    {"name":"foo","val":3}

All you need is this jq filter

    {name:.name, val:.vals[]}

To me this is much better than the proposed zq or jq solution you're using as a basis for comparison. You could almost use the shorter

    .vals = .vals[]

if the name in the output didn't change.

These filters takes advantage of how jq's [] operator converts a single result into separate results. For people new to jq this behavior is often confusing unless they've seen things like Cartesian products.

.[] - https://stedolan.github.io/jq/manual/#Array/ObjectValueItera...

link

MarkMarine 1519 days ago

counter point: I reach for jq probably twice a year. It's a slog every time, but way way less work than diving into the terse syntax and understanding the inner workings of jq. A good abstraction is the border of my understanding, a leaky abstraction means I have to have mastery of the internals to be successful. jq is a leaky abstraction.

link

hyperpallium2 1519 days ago

can also use name instead of name:.name

I think jq is very elegant - genius even - but whenever I use it, I have to look up the docs for syntax. But I guess that's true for any infrequently used tool.

link

chris37879 1519 days ago

This exactly. I think JQ's problem in this regard is further compounded because its query language just doesn't feel like anything else most people have used, I've certainly never come across anything quite like it, anyway.

link

1vuio0pswjnm7 1519 days ago

Thank you for your work on tcpdump, (original) bpf and the pcap library. I benefit from those projects everyday.

ZSON looks way better than JSON. I pray that the Zed project becomes more popular.

link

mccanne 1519 days ago

Wow, thanks.

Coincidentally, after hearing of a friend's woes dealing with massive amounts of CSV coming from a BPF-instrumental kernel, I played around a bit with integrating Zed and BPF. Just an experimental toy (and the repo is already out of date)...

https://github.com/brimdata/zbpf

The nice thing about Zed here is any value can be a group-by key so it's easy, for example, to use kernel stacks (an array of strings) in a grouping aggregate.

(p.s. for the record, the only thing I have to do with the modern linux BPF system is the tiny vestige of origin story it shares with the original work I did in the BSD kernel around 1990)

link

rienko 1519 days ago

Ever since my team started using Splunk (circa 2012), we claimed for a more open version we could tinker with and not cost an arm and a leg to ingest multiple terabytes of daily data.

Positioning as an opensource Splunk would be an interesting play. Going through your docs the union() function looks like it returns a set, akin to splunk values(), is there the equivalent to list()?

Elastic is great in its lane, but it requires more resources and has a monolith weight, that has left a sour taste from our internal testing. Doing a minimal ElasticSearch compatible API would open up your target audience, are there any plans to do you it in a short term horizon (< 1 year)?

link

mccanne 1519 days ago

That's a cool idea. We've had many collaborators using Zed lakes for search at smallish scale and we are still building the breadth of features needed for a serious search platform, but I think we have a nice architecture that holds the promise to blend the best of both worlds of warehouses and search.

As for list() and values() functions, Zed has native arrays and sets so there's no need for a "multi-value" concept as in splunk. If you want to turn a set into an array, a cast will do the trick, e.g.,

echo '1 2 2 3 3' | zq 'u:=union(this) | cast(u,<[int64]>) ' -

[1,2,3]

(Note that <[int64]> is a type value that represents array of int64.)

link

gauravphoenix 1519 days ago

there is Dassana[1] if someone wants to try out json native,index-free, schema-less solution built on top of ClickHouse.

ShowHN post(FAQ)[2]

disclaimer- I'm founder/CEO of Dassana.

[1] https://lake.dassana.io/

[2] https://news.ycombinator.com/item?id=31111432

link

noborus 1513 days ago

I wrote about how to solve with SQL. https://noborus.github.io/blog/jqsql/

link