| HN Mirror

Arrow the format is pretty good, there are occasional quirks (null bitmap has 1 = non-null etc) but no big deal.

From my experience Arrow the C++ implementation is pretty solid too, though I don't like it (taste). I just don't like their "force std::shared_ptr over Array, Table, Schema and basically everything" approach, why don't use an intrusive ref count if the object could only be hold by shared_ptr anyways? There are also a lot of const std::shared_ptr<Array>& arguments on not-obvious-when-it-takes-ownership functions. And immutable Array + ArrayBuilder (versus COW/switch between mutable uniquely owned and immutable shared in ClickHouse and friends), so if you have to fill the data out of order you are forced to buffer your data on your side.

Do note that the compute engine (e.g. Velox) may still need to implement their own (Arrow compatible) array types as there aren't many fancy encodings in Arrow the format.