|
|
|
|
|
by ianmcook
1955 days ago
|
|
Re this second point: Arrow opens up a great deal of language and framework flexibility for data engineering-type tasks. Pre-Arrow, common kinds of data warehouse ETL tasks like writing Parquet files with explicit control over column types, compression, etc. often meant you needed to use Python, probably with PySpark, or maybe one of the other Spark API languages. With Arrow now there are a bunch more languages where you can code up tasks like this, with consistent results. Less code switching, lower complexity, less cognitive overhead. |
|