Hacker News new | ask | show | jobs
by quadrature 2232 days ago
The problem being solved here is resource tuning. Which is a problem you will eventually encounter as your data org grows big. Specifically in our case the authors of our spark jobs understand the data modelling well but might not know how to tweak the spark parameters to optimize execution. As mentioned in the post, even if you do know what you're doing the process is long and time consuming. so i definitely see the value add here.

if you need ephemeral spark clusters dataproc in GCP will give that to you, theres probably a similar service in AWS and Azure.

1 comments

AWS EMR is a fairly straight-forward and reasonably cost-effective method to manage ephemeral Spark clusters on Amazon Web Services.