| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by idiotclock 2866 days ago

Spark is not too tricky to dive into, even though you can't really take advantage unless you have a big cluster to use :)

if you want to practice data-manipulation, and a lot of the map reduce type stuff you can do with spark, I find Pandas useful for small datasets (And a lot of overlap in functionality as far as Dataframes are concerned)

For pipeline stuff, definitely take a look at Luigi, but again without a cluster it'll be less fun. Still, if you can try automating tasks with a mini luigi scheduler on your localhost, it would be good practice