|
|
|
|
|
by posnet
3146 days ago
|
|
[Edit]: Why was the above comment flagged? How much of big queries performance do you think stems from Capacitor versus the rest of the system. For example if you switched it out with parquet, but kept everything else (Colossus, Dremel, Background reordering, metadata stored in Spanner etc) would it still be 10/30/50% worse or would it be an order of magnitude worse. |
|
And yeah, this really depends not just on the dataset, but also on how selective your queries are, what predicates and aggregations they employ, etc. A significant percentage of queries gets orders of magnitude faster. I can’t disclose how much faster things got on average, but it was a significant gain, way more than would be sufficient to offset the increased cost of encoding (which is another aspect people typically don’t consider), even considering that much of the data people encode is hardly ever touched.