Hacker News new | ask | show | jobs
by jkm2155 2198 days ago
Simple project to help beginners get started with data engineering. This is my fist post on HN, any feedback would be greatly appreciated.
1 comments

Thanks for the illuminating post. I like how Apache Airflow is used to move the pyspark script to a S3 location so that it can be read by the EMR step. I remember working on a project where we wanted to automate a data pipeline using Airflow and had this problem of how to get our pipeline scripts to the right locations.
@gifflar Glad you found it illuminating :). Yea moving spark script to S3 using a Airflow task is usually the easiest.