Hacker News new | ask | show | jobs
by smoldesu 926 days ago
Well, text is political. You're not going to say "Tiananmen Square" without a political sentiment, so your only option would be to censor it.

LLMs are text tokenizers, if the majority of it's training material leans liberal or conservative then the output should reflect that. I think a better idea is to avoid relying on glorified autocorrect for anything related to political drama.

1 comments

> You're not going to say "Tiananmen Square" without a political sentiment

you just did.

Actually the place itself is not controversial https://en.wikipedia.org/wiki/Tiananmen_Square any more than the National Mall in Washington, DC is controversial. It's what happened there on one day which is suppressed.
With that in mind, what would a truly apolitical representation of Tiananmen Square be, in terms of AI training data?