|
|
|
|
|
by rcoveson
1968 days ago
|
|
Better to compare it to Cap'n Proto instead. Arrow data is already laid out in a usable way. For example, an Arrow column of int64s is an 8-byte aligned memory region of size 8*N bytes (plus a bit vector for nullity), ready for random access or vectorized operations. Protobuf, on the other hand, would encode those values as variable-width integers. This saves a lot of space, which might be better for transfer over a network, but means that writers have to take a usable in-memory array and serialize it, and readers have to do the reverse on their end. Think of Arrow as standardized shared memory using struct-of-arrays layout, Cap'n Proto as standardized shared memory using array-of-structs layout, and Protobuf as a lightweight purpose-built compression algorithm for structs. |
|
I just want to say thank you for this part of the sentence. I understand struct-of-arrays vs array-of-structs, and now I finally understand what the heck Arrow is.