Hacker News new | ask | show | jobs
by PhDuck 1855 days ago
The main problem with both algorithms is that they seem overly sensitive, many times neither of the options seemed to be offensive.
1 comments

That's true. I could have added a "Neither is offensive" button, but my goal was just to show that you can't really tell the difference between what Twitter is doing and random chance and I thought that "Neither" would dilute that.
I think the quiz as it stands is a good way to demonstrate how bad twitter is. In particular I got a few sets of tweets that were identical and had to choose one as offensive at random.

I think a better measure of how accurate a filter is overall would be to show 1 tweet and response at a time and pick offensive or not offensive and then compare to the algorithms.

An earlier idea I had for this was to put two tweets side by side and ask the user to say which tweet was more offensive. Then, I figured I could use the ELO rating system to come up with an "Offensive score" for each tweet. e.g. "This tweet has an offensiveness of 2200" or something like that. I could then compare the average offensiveness of tweets that Twitter considered offensive versus not.

I wound up not going with that approach because many times you just have two completely innocuous tweets and picking which of the two of them is "more offensive" is just arbitrary. I could have curated the tweets so that only ones that were kind of offensive were in the quiz, but then I might be putting my thumb on the scales to get the answer I already believed in. I'd also need to get lots of ratings for each Tweet to have a stable score.

I think your idea is pretty interesting. It would allow a conclusion like "The average tweet Twitter marks as offensive is X% likely to offend a rater." I was coming at it more from a "Twitter's offensive identification is like random chance" perspective rather than just trying to assess quality. If I had considered this idea while creating the quiz I might have gone with it!

Dunno, the quiz is pretty much impossible to complete as it is now. I never managed to get past the second question without encountering a neither.