|
|
|
|
|
by belorn
1227 days ago
|
|
Correct me if I interpreted your wrong here, but I often see statements that imply hate sentences towards some groups like white, male, heterosexuals and so on are not "real" hate. The implication is that those are just ironical comparisons, jokes, or tropes. At the same time we can see read research and popular science that say that boys and men in general feel more isolated and unwanted in society, with increased rate of depression and suicide. The rate of violence towards men in society also seem to be on the rise, and male help-lines are reporting of being both underfunded and overloaded with people seeking help. It very fair from being a joke and the consequences are very much real. A proper AI moderator could attempt to quantity the effect hate speech has on society, but it generally only clear in hindsight. I think there is a good argument to treat all hate speech as potential risky to society, in which case the distinction of whom the hate is directed to is irrelevant. Hate is hate. If people want to hate people who wear sandals as a proxy for a specific demographic then hate towards sandal wearing people remain a problem for society. |
|
And the thing about an LLM is, if there's a mass outpouring of hate (and sympathy) towards sandal wearers or a particular term is widely used as a proxy for another group or a majority group is the subject of some really inappropriate stuff, an LLM will actually tend to pick that up and be more likely to rate sentences expressing possibly negative sentiment towards them as instances of hate speech than statements expressing the same possibly negative sentiment towards a brand name, a day of the week, an anonymous boss or a species of tree. It won't do it perfectly (however you define "perfectly"), but it looks a lot better than some of the proposed alternatives...
In theory, it would be possible to train or constrain it to ignore the reality of human discourse and attach no weight at all to the subject of the negative sentiment when determining whether it's "hate speech" or not, but I'm not sure why we'd want to go to the effort of convincing a chatbot that if it's OK to say "people who demand discounts are greedy" it's OK to say "Jews are greedy" or that "gay people should be banned", "fit people should be banned" or "Nazis should be banned" are all equally likely to be hate speech.