Hacker News new | ask | show | jobs
by ABeeSea 1989 days ago
Both pricing and start-up times are significantly better in Glue 2.0 (assuming one can migrate). But even on Glue 1.0, orchestrating an ETL process with with several dozen jobs is a non-trivial amount of configuration and labor. (Jobs failures, job restarts, paging, job run history, cloudwatch logs, re-usable infrastructure as code when creating a new jobs, permissions and security, etc) that the increased cost is more than worth it for us.

https://aws.amazon.com/blogs/aws/aws-glue-version-2-0-featur...

1 comments

We're crawling and processing TBs of web data, we just use some python workers, Airflow, SQS and trigger a few scheduled EMR jobs easy peasy. Restarts and what not are handled by kubernetes at the container level and by Airflow at the code level. Airflow bakes in permissions and managing jobs. Glue left us a lot to be desired in that area, and $400-600 per ingest can't beat $30 bucks for the time the EMR cluster is up and since we use Kube for everything already it wasn't much a hassle to continue using it here. I'm sure in your case it makes sense, and in ours it didn't and this is why technology is crazy :P