| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ryanworl 2710 days ago

No, I am not aware of any storage engine that provides that out of the box. The techniques are very tied into what your query processing engine can do and expects the data to look like.

For example, do you materialize tuples immediately, or do you fully run it through your processing pipeline and not materialize until the end?

Your storage engine and format needs to be at least somewhat involved in answer that question, because you need to know what data to read and when.

1 comments

arjunnarayan 2710 days ago

Unfortunately most of the systems that build what you're describing are closed source (e.g. Snowflake, Microsoft SQL Server, Vertica, Teradata). There isn't an open-source project that does all of those things.

link

georgewfraser 2710 days ago

What about Presto?

link

FridgeSeal 2710 days ago

Presto is more of a distributed SQL solution: you run it on a cluster of nodes, point them at your storage later, it’s more optimised at querying very large datasets and it’s not built or tuned for high performance (in terms of latency or execution time).

link