| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by boxy310 2427 days ago
	I once had a consulting gig where the customer desperately wanted to build a Spark/Scala ML pipeline, for a dataset that was 10 MB. We spent 3 months hammering it together for a flat Python process that would've taken us 2 weeks.

2 comments

snaky 2427 days ago

> This find xargs mawk pipeline gets us down to a runtime of about 12 seconds, or about 270MB/sec, which is around 235 times faster than the Hadoop implementation.

https://adamdrake.com/command-line-tools-can-be-235x-faster-...

link

buzzkillington 2427 days ago

If you'd sent it off to mechanical Turk it would have been done in an afternoon.

link