| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by solidangle 2314 days ago

> Blazing fast results

I highly doubt this, given that the query engine is interpreted and non-vectorized. Queries are 10x to a 100x slower on a simple query, and 100x to 1000x slower on a query with large aggregations and joins without compilation of vectorization.

> Full SQL Exploration

Except for window functions it seems. These actually matter to data analysts.

3 comments

benesch 2314 days ago

Considerations are completely different in a streaming context. It’s not so much about how fast you can churn through terabytes of data; it’s more about how quickly you can turn around the incremental computation with each new datum. There’s some serious research behind this product, in timely and differential dataflow, and I’d encourage you to check out some of that research before making sweeping performance claims. Frank’s blog post on TPC-H is a good place to start: https://github.com/frankmcsherry/blog/blob/master/posts/2017...

We definitely have some performance engineering work to do in Materialize, but don’t let the lack of vectorization scare you off. It’s just not as important for a streaming engine.

link

edmundsauto 2314 days ago

It's one thing to be skeptical and ask for evidence of speed, another to dismiss them out of hand due to a casual review of their website. Or did I miss it that you tried it out and found it wanting?

I work in an org with > 100 data scientists. I bet that 50% have never used window functions. I would guess than fewer than 20% know how to write one.

Your intent in this comment is unclear, but if you were looking to provide actionable feedback, you might want to reconsider your tone. This project looks like a pretty impressive feat of applied CS theory to doing useful stuff.

link

solidangle 2314 days ago

FWIW, I do think this project is really cool. I should have taken a little bit more time to write this comment, as it's overly negative right now.

link