Hacker News new | ask | show | jobs
by CivBase 1903 days ago
But they don't give background for how they chose the terms they used as inputs. They just hand-wave through that part of the methodology.

All you have to do is cherry pick the inputs and keep re-running until you have the desired output. Garbage in, garbage out.

> 58 well-known hate terms and phrases

Funny enough, I didn't recognize many of those "well-known" terms and every single one of them ended up being white supremacist.

On top of that, 58 is a pathetically small dataset to draw conclusions from. You'll need far more than that to credibly demonstrate a bias. I can probably come up with that many racial slurs off the top of my head and I bet you anything the overwhelming majority of them will be blacklisted.

2 comments

Their github repo has a link to further description on each hate term. Some are hate groups. Some are deliberate misinformation about marginalized groups. Some are phrases instead to sound non-offensive to a general audience, while edging racist ideas into the mainstream.

Also, hate groups tend to cycle through new terminology whenever the previous batch becomes easily recognized by a general audience.

https://github.com/the-markup/investigation-youtube-ad-place...

Describing what makes a term hateful is not the same as explaining how it got on the list. There are plenty of better known hate terms which are conspicuous by their absence. Naturally, the more obscure the term the less likely it is to be caught by the blacklist.
How is it 2021 and no one questions this BS? IMO any group that focuses on affirming the identity of fictional races, IS a hate group. A racially themed group pushing its own propaganda can never not be a hate group. We have the human species and just variations in hair, skin and eye color. Can’t we get past that simple concept?
As the methodology states, they compiled a list of 2,039 terms, and did the manual and automated preprocessing work to dedupe that list (the result of which seems to be here [0]). They subsequently vetted the final list in collaboration with Harvard's Shorenstein Center.

> Funny enough, I didn't recognize many of those "well-known" terms and every single one of them ended up being white supremacist.

You don't recognize the phrases "kkk", "holocaust denial", "white power", "ethnic cleansing", or "daily stormer"? How is that the story's fault that you haven't encountered these well-known phrases and entities?

In any case, the code and source list of keywords is all in their repo. You're accusing them of "cherry-picking" – when they've provided the entirety of their search space and API output? If you have a better idea of assessing Youtube's policy vs. actual implementation, or you just feel that the methodology is easy to invalidate – there's nothing stopping you from cloning the repo and running your own automated check and analysis. I mean, other then the fact that Google obfuscated their API in response to this story.

[0] https://github.com/the-markup/investigation-youtube-ad-place...

> As the methodology states, they compiled a list of 2,039 terms, and did the manual and automated preprocessing work to dedupe that list (the result of which seems to be here [0]). They subsequently vetted the final list in collaboration with Harvard's Shorenstein Center.

So... they started with a massive list manually picked from a series of organizations with clear political leanings (no filter criteria given), manually filtered that list from there (no filter criteria given), then vetted it with an unnamed group of people whose only credibility given is an association with a research center at Harvard (still no filter criteria given). I see plenty of opportunity for cherry picking here, and the resulting list should speak for itself.

> You don't recognize the phrases "kkk", "holocaust denial", "white power", "ethnic cleansing", or "daily stormer"?

I did not recognize "daily stormer". But of course the rest of your picks are among the most recognizable from the list. No good faith here, I guess.

Here's a sample of some terms I did not recognize: "2083: a european declaration of independence", "black sun", "blood and soil", "identity evropa", "red ice tv", or "white sharia" just to name a few.

> there's nothing stopping you from cloning the repo and running your own automated check and analysis.

There isn't enough time in 100 lives to dedicate to discrediting nonsense like this. Even if I put the time and effort into doing this better with no compensation, I don't have a sliver of the publicity they do. HN comments are much more cost effective.