|
|
|
|
|
by majoe
511 days ago
|
|
Arrow is pretty cool, although I haven't had the opportunity yet to use it. I skimmed the paper you linked and wondered, how one measures the ser/de time a query takes or more generally how one would estimate the possible speedup of using Arrow Flight for communication with a database. Do you by chance have any insights in that direction? At work we have a Java application, that produces a big amount of simulation results (ca. 1Tb per run), which are stored in a database. I suspect, that a lot of time is wasted for ser/de, when aggregating the results, but it would be good to have some numbers. |
|
You can sort of see what benefits you might get from a post like this, though: https://willayd.com/leveraging-the-adbc-driver-in-analytics-...
While we're not using Arrow on the wire here, the ADBC driver uses Postgres's binary format (which is still row oriented) + COPY and can get significant speedups compared to other Postgres drivers.
The other thing might be to consider whether you can just dump to Parquet files or something like that and bypass the database entirely (maybe using Iceberg as well).