| HN Mirror

You can see the strong bias towards egalitarian solutions in all models, including the open weight ones without external alignment harnesses. The one thing I noticed right away working with post-gpt2 models is that in general, they tend towards being ”better people” than most people do.

I strongly suspect that this is because training data harvested from the internet largely falls in to two categories: various kinds of trolls and antisocial characatures, and people putting their best foot forward to represent themselves favourably. The first are generally easy to filter out using simple tools.