Hacker News new | ask | show | jobs
by lmeyerov 1965 days ago
This may help you do zero copy for a column of multi-dim without losing value types, just that it's encoding a multi-dim. This example is for values that are 3x3 of int8's:

```

import pyarrow as pa

my_col_of_3x3s = pa.struct([ (f'f_{x}_{y}', pa.int8()) for x in range(3) for y in range(3) ])

```

If using ndarrays, I think our helpers are another ~4 lines each. Interop with C is even easier, just cast. You can now pass this data through any Arrow-compatible compute stack / DB and not lose the value types. We do this for streaming into webgl's packed formats, for example.

What you don't get is a hint to the downstream systems that it is multidimensional. Tableau would just let you do individual bar charts, not say a heatmap, assuming they support rank 2's. To convert, you'd need to do that zero-copy cast to whatever they do support. I agree a targetable standard would avoid the need for that manual conversion, and increases the likelihood they use the same data rep.

Native support would also avoid some header bloat from using structs. However, we find that's fine in practice, it's metadata. E.g., our streaming code reads the schema at the beginning and then passes it along, so actual payloads are pure data, and skip resending metadata.