Hacker News new | ask | show | jobs
by kommissar 2051 days ago
> We use Glue extensively, but we have a rule of thumb not to use any of the 'special sauce'. That means is using it purely for 'Spark as a service'

This is spot on IMO. I use Glue internally (opinions are my own) and still believe that the best course of action is Glue should only run managed Spark. We provide an empty Scala "script" that does nothing, and load a compiled JAR file with the Scala code that actually runs our job as a library and have Glue exec into that.

We can version the ETL in git, run local tests outside of the Glue data plane, prototype in the Spark shell, and much more.