|
|
|
|
|
by __all__
1015 days ago
|
|
Thanks! Honestly, I can't envision a near future where SQL is not the main interface. Happy to see the future proving me wrong here though! Despite I can buy the arguments about how having a better data structure to communicate between processes (in the same server) could help, it's a bit difficult to wrap my mind around how Arrow will help in distributed systems (compared to any other performant data structure). Do you have any resources to understand the value proposal in that area? Same for vector processing, would be great to read a bit more about some optimizations that would help improving Postgres leaving out pure analytical use cases. |
|
Comparing with the role of Protobuf is perhaps easiest, there's a good FAQ entry [0] which concludes: "Arrow and Protobuf complement each other well. For example, Arrow Flight uses gRPC and Protobuf to serialize its commands, while data is serialized using the binary Arrow IPC protocol".
This will be increasingly significant due to the hardware trends in network & memory (and ultimately storage too) compared with CPUs. I posted about that in a comment a few days ago [1], but it's worth sharing again:
> here’s a chart comparing the throughputs of typical memory, I/O and networking technologies used in servers in 2020 against those technologies in 2023
> Everything got faster, but the relative ratios also completely flipped
> memory located remotely across a network link can now be accessed with no penalty in throughput
The graphs demonstrate it very clearly: https://blog.enfabrica.net/the-next-step-in-high-performance...
> would be great to read a bit more about some optimizations that would help improving Postgres leaving out pure analytical use cases
Unfortunately I don't have a good reference on that to hand but I'll take a look around and reply again soon.
[0] https://arrow.apache.org/faq/#how-does-arrow-relate-to-proto...
[1] https://news.ycombinator.com/item?id=37365816
[2] https://www.singlestore.com/comparisons/postgresql/