| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by usgroup 2607 days ago

You need representative data; the sort you’ll be working with. There’s plenty of it for free on AWS. E.g. GDELT.

You then need use cases: things to do to the data. From this you learn how to process it using whatever tools you like.

How to set up clusters ... worry about that less. It’s more and more commoditised over time and it’s the admin part of data engineering anyway.