Hacker News new | ask | show | jobs
by posnet 3146 days ago
[Edit]: Why was the above comment flagged?

How much of big queries performance do you think stems from Capacitor versus the rest of the system. For example if you switched it out with parquet, but kept everything else (Colossus, Dremel, Background reordering, metadata stored in Spanner etc) would it still be 10/30/50% worse or would it be an order of magnitude worse.

3 comments

We looked at Parquet early on and it wasn’t competitive even with what we were using at the time.

And yeah, this really depends not just on the dataset, but also on how selective your queries are, what predicates and aggregations they employ, etc. A significant percentage of queries gets orders of magnitude faster. I can’t disclose how much faster things got on average, but it was a significant gain, way more than would be sufficient to offset the increased cost of encoding (which is another aspect people typically don’t consider), even considering that much of the data people encode is hardly ever touched.

It would have to depend on the dataset, right?

For anyone who doesn't know what Capacitor is: https://cloud.google.com/blog/big-data/2016/04/inside-capaci...

> Why was the above comment flagged?

It appears that relatively new accounts that post here are automatically flagged. I've seen it before.

Usually if I click on their specific comment and vouch for them, they get unflagged, but it didn't work this time.

Lame.