Great start, if you keep at it i'd love to see more of the advanced stuff. I feel like we're all hitting problems like skew and it would be cool to have a reference for dealing with those.
Hey quadrature, thanks for the feedback! Would you be able to go into more details about what skew you see :)?
In chapter 7 I go into some methods of fixing skewed data when performing joins. This solved a majority of our skew problems, but we still see skew on aggregates I believe. I am working on how to debug/find skews in a spark application in Chapter 6, wanted to initially release this as I've been procrastinating over 2 years to do so lol.
We have done more spark parameter optimizations but that helps after the data skew have been resolved.
In chapter 7 I go into some methods of fixing skewed data when performing joins. This solved a majority of our skew problems, but we still see skew on aggregates I believe. I am working on how to debug/find skews in a spark application in Chapter 6, wanted to initially release this as I've been procrastinating over 2 years to do so lol.
We have done more spark parameter optimizations but that helps after the data skew have been resolved.