|
|
|
|
|
by __vb__
2232 days ago
|
|
databricks has other optimizations on top of open source spark version, are you maintaining your own version of spark or using the vanilla version of spark. One thing I constantly deal with is how to optimize spark, how to use ganglia and spark ui to dig into what is causing data skew and slowness while running jobs. Is this something that you do better than databricks? |
|
Optimization/Monitoring: This topic is very important to us, thanks for bringing it up. Indeed we automatically tune configurations, but developers still need to understand the performance of their app to write better code. We're working on a Spark UI + Ganglia improvement (well, replacement really), which we could potentially open source.
Would you mind emailing me (jy@datamechanics.co) or even scheduling a call with me (https://calendly.com/b/datamechanics/avk7bhxq) so I show you what we have in mind and get your feedback? Anyone else interested is welcome to do the same.