Hacker News new | ask | show | jobs
by tycho01 3772 days ago
I'm interpreting this as saying they want to use the same representation to use in memory (for querying) and for 'serialization' (sending the same thing over the wire). This begs the question why separate serialized representations ever became a thing in the first place.

My understanding is that serialization became a thing because in-memory representations tend to use pointers to shared data structures that may thus be referenced multiple times while being stored only once. This would not translate 1:1 to serialized representations (where memory offsets would no longer hold meaning) -- much less in any language-agnostic way.

So I have this suspicion that Apache Arrow would not support reusing duplicate data while storing it only once. Would anyone mind clarifying on this point?