Hacker News new | ask | show | jobs
by gifflar 2198 days ago
Thanks for the illuminating post. I like how Apache Airflow is used to move the pyspark script to a S3 location so that it can be read by the EMR step. I remember working on a project where we wanted to automate a data pipeline using Airflow and had this problem of how to get our pipeline scripts to the right locations.
1 comments

@gifflar Glad you found it illuminating :). Yea moving spark script to S3 using a Airflow task is usually the easiest.