Hacker News new | ask | show | jobs
Ask HN: What is a broke person's way to learn data engineering?
10 points by pendergast 2607 days ago
As a broke individual living in a third world country who wants to learn data engineering, how do I go about it? AWS/similar options seem to be super expensive to me, and I don't have access to clusters. Any advice?
6 comments

AWS, GCE, Azure, all of them offer credits for new accounts to experiment. You might be limited in some resources but you can learn a lot. Also, a lot of data tools such as Spark, Airflow, Kafka, etc can be deployed in Kubernetes, which can be run locally using Minikube. Just that and read a lot of blog posts.
Most people (regardless of where they are in the world) are not going to be able to afford a cluster of any meaningful size. So, the best option is to learn these tools by running them on your local machine. You can use Virtualbox or Docker to set up virtual hosts within your machine to simulate a cluster.

Also, data engineering is a vast field. Pick an area that you would like to pursue first and go deep: Machine learning (feature extraction+training), Hadoop/Spark job processing, event streaming & aggregation etc.

Then try to find a entry level job that would allow you to apply what you learned and also get to expand your knowledge of running actual clusters.

You need representative data; the sort you’ll be working with. There’s plenty of it for free on AWS. E.g. GDELT.

You then need use cases: things to do to the data. From this you learn how to process it using whatever tools you like.

How to set up clusters ... worry about that less. It’s more and more commoditised over time and it’s the admin part of data engineering anyway.

We might be able to give you more targeted advice for your country.

I'm a massive advocate of community colleges over "boot camps" and other such educational vectors, but I'm not sure what infrastructure is already available in your country that would help you get on that path.

Thanks for replying! I'm Indian. I only have access to a laptop at the moment. I'm not sure that educational programs in data engineering are so accessible to me, so I was looking to go the self learning route.
what is data engineering to you? You don't need a cloud provider to setup a "cluster". Look at kubernetes where you can run a "cluster" in 1 machine that will have multiple pods/services running.
Hi, I can help you with logistics and learning. Can you ping me on cradleofdata@gmail.com.