|
|
|
|
|
by jacquesnadeau
3145 days ago
|
|
The Arrow project itself is a set of libraries. One of the things we'll do is try to add more algorithms over time to it so if you want say, a fast arrow sort or arrow predicate application. Full SQL is always far more complex and I can't see the project itself . The engine inside of Dremio is something we call Sabot (a shoe for modern arrows, see sabot round on wikipedia). We hope to make it modular enough one day to use a library but it isn't there yet. In regards to your other question re projects/products: Arrow contributors are actively trying to get more adoption of Arrow as an interchange format for several systems. We've had discussions around Kudu (no serious work done yet afaik). Parquet-to-Arrow for multiple languages is now available. Arrow committers include committers from several other projects such as HBase, Cassandra, Phoenix, etc. The goal is ultimately to figure integrations with all. In most cases, these data storage systems are saddled with slow interfaces for data access. (Think row-by-row, cell-by-cell interfaces.) Arrow, among other things, allows them to communicate through a much faster mechanism (shared memory--or at least shared representation if not node local). |
|
What's the differentiator? Is dremio smarter somehow and avoids copying all data to perform a simple join? Or does it copy the data the same way but Arrow lets it be faster than Presto? What's on your roadmap?