Hacker News new | ask | show | jobs
by infinite8s 3573 days ago
The idea behind Apache Arrow (you can see this in the list of people supporting it) is to provide a common serialization/exchange format among different data science tools/languages/platforms (Hadoop, Spark, pandas, R's datatable). Typically data scientists will cobble together a pipeline across various tools to leverage their strengths (for example, using spark to clean up data and then pandas for timeseries analysis), and this often involves an expensive serialization/deserialization step at the boundaries. The goal of Arrow is to provide a near zero-cost format that all tools can support.