| That's a good question But an even better one would be "where would you set your parameters for absence of bias" with this test I mean, take 6,774 sentences expressing negative sentiments about "gay people". I'm guessing that you're familiar with the fact that a lot of people do write these sentences, and many of them are utterly dead serious about it and genuinely do hate or at least feel a certain amount of contempt for gay people (and that sometimes there are actual consequences from this, the avoidance of which is sort of the whole point of ChatGPT policing "hate speech") And take 6,774 sentences expressing the same negative sentiments about "straight people". It's probably safe to assume that some of these have never been written in the history of human discourse except for the purposes of testing ChatGPT. For others, the ratio of real world use of sentences to bully heterosexuals as opposed to making ironic comparisons to popular anti-gay tropes or casual jokes is going to be very, very different. The author didn't test 6674 sentences expressing negative sentiments towards non-human stuff that's unlikely to be valued by anybody else like "my own shoes" to see what proportion of those were classed as hate speech, but I think we can probably all agree that none of them should be. The proportion of sentences deemed hate speech for "gay people" was around 80% and for "straight people" around 70%. Is that an underestimate because it's not the same for gay people? Or is it actually a massive overestimate because in actual real world use (which ChatGPT does have some data on...) sentences about "straight people" aren't much more likely to be used for the purposes of bullying, harassment or hate campaigns than sentences about "my own shoes"? More interesting, perhaps, is the fact that it's much, much happier with people applying negative adjectives to political groups than vulnerable sexual orientations like heterosexuality. Unlike the supposed bias towards certain sexualities or ethnic groups, this is a bias which is clearly very unrepresentative of how hateful statements are actually likely to be. When people say bad things about Democrats or Republicans or liberals or conservatives they often really, really mean it. But is it a bad bias to be more permissive of saying that political groups are "wrong" or "untrustworthy" or "greedy" or is it simply permitting stuff which is [i] often more likely to be fair comment because we're criticising attitudes of groups people joined rather than innate characteristics and [ii] arguably more necessary for free political debate and [iii] much more tolerated by liberals and conservatives alike. (And if we're going down the "more likely to be fair comment route", what exactly are the sentences and do they - coincidentally or otherwise - happen to just map less to "fair comment" about one political group than another?) |
At the same time we can see read research and popular science that say that boys and men in general feel more isolated and unwanted in society, with increased rate of depression and suicide. The rate of violence towards men in society also seem to be on the rise, and male help-lines are reporting of being both underfunded and overloaded with people seeking help. It very fair from being a joke and the consequences are very much real.
A proper AI moderator could attempt to quantity the effect hate speech has on society, but it generally only clear in hindsight. I think there is a good argument to treat all hate speech as potential risky to society, in which case the distinction of whom the hate is directed to is irrelevant. Hate is hate. If people want to hate people who wear sandals as a proxy for a specific demographic then hate towards sandal wearing people remain a problem for society.