Hacker News new | ask | show | jobs
by polskibus 3145 days ago
How does dremio differ from PrestoDB? As far as I know, PrestoDB can also virtualize access to many data sources and join data between them. We didn't go deep with PrestoDB because our basic tests for multi-source joins ran very slowly, and it seemed to pull all data from both joined tables into one place. I'm not a Prestodb expert, so maybe there's a better way to do it (all suggestions welcome).

What's the differentiator? Is dremio smarter somehow and avoids copying all data to perform a simple join? Or does it copy the data the same way but Arrow lets it be faster than Presto? What's on your roadmap?

1 comments

PrestoDB is similar to Impala, Hive and other SQL Engines. Each is designed to do distributed SQL processing. Dremio does embed an OSS distributed SQL processing engine (Sabot, built natively on Arrow) as well but we see that as only a means to an end. Our focus is much more on being a bi & data fabric/service.

At the core of this vision are: very advanced pushdowns (far beyond other OSS systems), a powerful self-service UI for managing, curating and sharing data (designed for analysts, not just engineers) and--most importantly--the first open source implementation of distributed relational caching for all types of data. You can see more details about this last part in a deck I presented at DataEngConf early today: https://www.slideshare.net/dremio/using-apache-arrow-calcite...