|
|
|
|
|
by Grimm1
1989 days ago
|
|
We're crawling and processing TBs of web data, we just use some python workers, Airflow, SQS and trigger a few scheduled EMR jobs easy peasy. Restarts and what not are handled by kubernetes at the container level and by Airflow at the code level. Airflow bakes in permissions and managing jobs. Glue left us a lot to be desired in that area, and $400-600 per ingest can't beat $30 bucks for the time the EMR cluster is up and since we use Kube for everything already it wasn't much a hassle to continue using it here. I'm sure in your case it makes sense, and in ours it didn't and this is why technology is crazy :P |
|