|
|
|
|
|
by joelwilsson
3143 days ago
|
|
BigQuery uses a lot of tricks to get efficiency, but this post emphasizes Apache Arrow and open data formats like it as the way forward (in particular the last point, "Be open, or else…") which are not currently supported by BigQuery. If Apache Arrow takes off I hope BigQuery will support it as a data interchange format in the future. Zero-copy is pretty awesome, as are open standards in general. This feature does not exist in BigQuery today (as far as I know - definitely not as discussed in the source). |
|
So it’s a little incorrect to think of this as a “file format” in the first place. If you end up designing it like that, you’d not be able to have a lot of the gains that the Abadi paper (and others like it) alludes to. My suggestion would be to go whole hog and push down as much filtering and aggregation in there as is feasible, exposing a higher level interface with _at least_ filtering predicate support, and do it in C++.