Hacker News new | ask | show | jobs
by awaythrowact 1788 days ago
Fair. I agree Arrow is still more of a vision than anything else.

> it didn't even provide a shape attribute

I suspect this has to do with the project's focus. I think they aspire to be a back-end to DataFrame libraries, which are generally 2d. I think they (correctly) are ceding the "n-dimensional tensor computation" space to the current incumbents.

1 comments

Arrow is getting support for N-d arrays, so if anything they're expanding in that area (which is exciting). I don't think they're interested in creating a universal libarrow though, the point of the data format and C data interface is to have languages define their own implementations.
I may be wrong. It happens a lot! But I think Arrow's vision encompasses compute, not just a data format and data interface.

https://www.slideshare.net/wesm/pycon-colombia-2020-python-f...

Slide 43: The "Arrow C++ Platform" encompasses a "Multi-core Work Scheduler" and a "Query Engine"

Slide 38: "It would be more productive (long-term) to have a reusable computational foundation for data frames"

Again, I agree that, today, it's more data format, and the shared compute stuff is more a vision.

EDIT: See also https://ursalabs.org/tech/

For sure, I didn't mean to imply they weren't looking at compute too! https://github.com/apache/arrow-datafusion is another example of the shared compute vision. What I was trying to point out is that (at least for Arrow core) they seem to eschew FFI and generating shared libraries in favour of from scratch implementations in other compiled languages and direct bindings in interpreted ones.