| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by niviksha 1486 days ago

Thanks for sharing this. It is a very interesting problem that highlights some of the technical challenges of working with modern event data, which happens to 'prefer' being semi-structured (i.e JSON is the most natural serialization format while creating events).

It's also something we're working on! Shameless plug - I happen to work at Sneller (sneller.io, open source at https://github.com/SnellerInc/sneller) that might be interesting to you.

A couple of key ideas - first, we bypass the need for any sort of 'semi-structured to relational' ETL/ELT overhead by running vectorized SQL on a (compressed) binary form of the JSON data which preserves its original structure. So we're schema-on-read first and foremost - you don't need to worry about adding new fields in the source JSON as long as your queries know of these new fields.

Second, we completely separate storage from compute. Unlike CH we don't use local disk as any sort of storage tier, and use cloud object stores as our _primary_ storage tier. So all your data (including the compressed binary version of your source JSON) lives in s3 buckets in your control.

Feel free to check us out and let us know what you think!

1. Github - https://github.com/SnellerInc/sneller

2. Intro blog - https://github.com/SnellerInc/blogs/blob/main/introducing-sn...