Hm, is your hypothesis that code from women who are identifiable as women is more often disagreeable than code from women who are not identifiable as women?
Consider the method they used to identify gender - they based it on the email being linked to a Google+ account. This rules out most submissions made as part of corporate work.
So if corporate users submit better code than hobbyists then you would expect a drop. And you do see a significant drop across the board for gender-identifiable vs gender-indeterminate contributions.