Hacker News new | ask | show | jobs
by fixxer 3798 days ago
Spark has improved our ETL jobs by orders of magnitude, both with respect to performance and ability to engage our workforce (mostly Python programmers).

Previous tools that improved workflow: docker, nginx.

2 comments

What is Spark? Google turns up a bunch of projects called that.
I assume they are talking about Apache Spark http://spark.apache.org/
Mind if I ask how you use Spark for your ETL jobs?
Feature engineering. Transfers about 3.5b records into features that go into a variety of models. Previously was a hadoop streaming job (~40 hours); now about 6.