Hacker News new | ask | show | jobs
by la6471 2053 days ago
What is the difference between Apache spark and Apache arrow?
1 comments

Apache Arrow is a specification for in-memory columnar data, IPC format + Flight protocol, with implementations in a number of languages. Some of the implementations contain code to perform computations on the in-memory data. Some of the implementations contain some form of query engine. All of these are single process / libraries, rather than distributed systems.

Apache Spark is a distributed compute platform, which does have some support for Arrow for interop purposes.

One of the things that led me to get involved in Arrow originally was to explore the idea of building something like Apache Spark based on Arrow (and Rust) and my latest prototype of that concept is in the Ballista project [1].

[1] https://github.com/ballista-compute/ballista