Hacker News new | ask | show | jobs
by majorturd 5048 days ago
From TFA "We discuss the core ideas in the context of a read-only system, for simplicity. Many Dremel queries are one-pass aggregations; there-fore, we focus on explaining those and use them for experiments in the next section. We defer the discussion of joins, indexing, up-dates, etc. to future work." Really, it takes Dremel multiple SECONDS to complete trivial massively parallelized read queries? It must take hours for an UPDATE or JOIN then. Wake me up when you move past the trivial, until then, enjoy your hair.
2 comments

Dremel is a query tool, not a database.
That's true, BUT to query you need to have the ability to perform joins. That is what makes raw MapReduce such a pain and even higher level abstractions slow. I like the idea that Dremel is showing, I even downloaded the Google paper to read tonight, but the Apache implementation needs to have joins otherwise it's not a "query tool".
You can join on top of BigQuery with small join tables. https://developers.google.com/bigquery/docs/query-reference#...
cough Here small means less than 8MB of compressed data cough
What is the performance of your JOIN system?