Hacker News new | ask | show | jobs
by ryanworl 2710 days ago
No, I am not aware of any storage engine that provides that out of the box. The techniques are very tied into what your query processing engine can do and expects the data to look like.

For example, do you materialize tuples immediately, or do you fully run it through your processing pipeline and not materialize until the end?

Your storage engine and format needs to be at least somewhat involved in answer that question, because you need to know what data to read and when.

1 comments

Unfortunately most of the systems that build what you're describing are closed source (e.g. Snowflake, Microsoft SQL Server, Vertica, Teradata). There isn't an open-source project that does all of those things.
What about Presto?
Presto is more of a distributed SQL solution: you run it on a cluster of nodes, point them at your storage later, it’s more optimised at querying very large datasets and it’s not built or tuned for high performance (in terms of latency or execution time).