Hacker News new | ask | show | jobs
by ericxiao251 2681 days ago
Hey quadrature, thanks for the feedback! Would you be able to go into more details about what skew you see :)?

In chapter 7 I go into some methods of fixing skewed data when performing joins. This solved a majority of our skew problems, but we still see skew on aggregates I believe. I am working on how to debug/find skews in a spark application in Chapter 6, wanted to initially release this as I've been procrastinating over 2 years to do so lol.

We have done more spark parameter optimizations but that helps after the data skew have been resolved.