Hacker News new | ask | show | jobs
by usgroup 2607 days ago
You need representative data; the sort you’ll be working with. There’s plenty of it for free on AWS. E.g. GDELT.

You then need use cases: things to do to the data. From this you learn how to process it using whatever tools you like.

How to set up clusters ... worry about that less. It’s more and more commoditised over time and it’s the admin part of data engineering anyway.