Hacker News new | ask | show | jobs
by niel 1121 days ago
Speculation on my part, but I believe us non-native English speakers write more formally and with less natural flow.

I also wonder which proportion of English writing (in general) is written by non-native speakers, and whether we might be disproportionately represented in training data.

1 comments

Yes, I'd speculate along with you that this is not "bias" but just probability space: sampled input, styled output.

Had the training cutoff been prior to SEO and "content generation" farms, as well as a shift in balance of academic writing published, the embedding space would be different.