|
|
|
|
|
by quadrature
2232 days ago
|
|
The problem being solved here is resource tuning. Which is a problem you will eventually encounter as your data org grows big. Specifically in our case the authors of our spark jobs understand the data modelling well but might not know how to tweak the spark parameters to optimize execution. As mentioned in the post, even if you do know what you're doing the process is long and time consuming. so i definitely see the value add here. if you need ephemeral spark clusters dataproc in GCP will give that to you, theres probably a similar service in AWS and Azure. |
|