| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by fixxer 3798 days ago
	Spark has improved our ETL jobs by orders of magnitude, both with respect to performance and ability to engage our workforce (mostly Python programmers). Previous tools that improved workflow: docker, nginx.

2 comments

mfincham 3798 days ago

What is Spark? Google turns up a bunch of projects called that.

link

tuckerman 3798 days ago

I assume they are talking about Apache Spark http://spark.apache.org/

link

xfax 3798 days ago

Mind if I ask how you use Spark for your ETL jobs?

link

fixxer 3797 days ago

Feature engineering. Transfers about 3.5b records into features that go into a variety of models. Previously was a hadoop streaming job (~40 hours); now about 6.

link