Hacker News new | ask | show | jobs
by karterk 2607 days ago
Most people (regardless of where they are in the world) are not going to be able to afford a cluster of any meaningful size. So, the best option is to learn these tools by running them on your local machine. You can use Virtualbox or Docker to set up virtual hosts within your machine to simulate a cluster.

Also, data engineering is a vast field. Pick an area that you would like to pursue first and go deep: Machine learning (feature extraction+training), Hadoop/Spark job processing, event streaming & aggregation etc.

Then try to find a entry level job that would allow you to apply what you learned and also get to expand your knowledge of running actual clusters.