Hacker News new | ask | show | jobs
by bigth 1903 days ago
I read the whole article and still don't know what they discovered. Highly abused and unlikely to profitable or legit search terms are banned...? What was the discovery?
2 comments

The asymmetry is the bigger part. Advertising allows for community growth by pointing to more resources and I formation. Essentially, "If you support X, here is a group working on the topic of X.". When this is allowed for X="white power" but forbidden for X="black lives matter", that's a huge disparity. That means ads cannot be used to recruit against police violence, but can be used to recruit for white supremacy.
It's very easy to make a list like this look asymmetrical when your article cherry picks the most outrage-inducing results from your test and disregards any results that don't fit your narrative.
The story links to a Github repo [0] with code, input data, and several notebooks, as well as a lengthy methodology [1].

[0] https://github.com/the-markup/investigation-youtube-ad-place...

[1] https://themarkup.org/google-the-giant/2021/04/08/how-we-dis...

But they don't give background for how they chose the terms they used as inputs. They just hand-wave through that part of the methodology.

All you have to do is cherry pick the inputs and keep re-running until you have the desired output. Garbage in, garbage out.

> 58 well-known hate terms and phrases

Funny enough, I didn't recognize many of those "well-known" terms and every single one of them ended up being white supremacist.

On top of that, 58 is a pathetically small dataset to draw conclusions from. You'll need far more than that to credibly demonstrate a bias. I can probably come up with that many racial slurs off the top of my head and I bet you anything the overwhelming majority of them will be blacklisted.

Their github repo has a link to further description on each hate term. Some are hate groups. Some are deliberate misinformation about marginalized groups. Some are phrases instead to sound non-offensive to a general audience, while edging racist ideas into the mainstream.

Also, hate groups tend to cycle through new terminology whenever the previous batch becomes easily recognized by a general audience.

https://github.com/the-markup/investigation-youtube-ad-place...

Describing what makes a term hateful is not the same as explaining how it got on the list. There are plenty of better known hate terms which are conspicuous by their absence. Naturally, the more obscure the term the less likely it is to be caught by the blacklist.
How is it 2021 and no one questions this BS? IMO any group that focuses on affirming the identity of fictional races, IS a hate group. A racially themed group pushing its own propaganda can never not be a hate group. We have the human species and just variations in hair, skin and eye color. Can’t we get past that simple concept?
As the methodology states, they compiled a list of 2,039 terms, and did the manual and automated preprocessing work to dedupe that list (the result of which seems to be here [0]). They subsequently vetted the final list in collaboration with Harvard's Shorenstein Center.

> Funny enough, I didn't recognize many of those "well-known" terms and every single one of them ended up being white supremacist.

You don't recognize the phrases "kkk", "holocaust denial", "white power", "ethnic cleansing", or "daily stormer"? How is that the story's fault that you haven't encountered these well-known phrases and entities?

In any case, the code and source list of keywords is all in their repo. You're accusing them of "cherry-picking" – when they've provided the entirety of their search space and API output? If you have a better idea of assessing Youtube's policy vs. actual implementation, or you just feel that the methodology is easy to invalidate – there's nothing stopping you from cloning the repo and running your own automated check and analysis. I mean, other then the fact that Google obfuscated their API in response to this story.

[0] https://github.com/the-markup/investigation-youtube-ad-place...

> As the methodology states, they compiled a list of 2,039 terms, and did the manual and automated preprocessing work to dedupe that list (the result of which seems to be here [0]). They subsequently vetted the final list in collaboration with Harvard's Shorenstein Center.

So... they started with a massive list manually picked from a series of organizations with clear political leanings (no filter criteria given), manually filtered that list from there (no filter criteria given), then vetted it with an unnamed group of people whose only credibility given is an association with a research center at Harvard (still no filter criteria given). I see plenty of opportunity for cherry picking here, and the resulting list should speak for itself.

> You don't recognize the phrases "kkk", "holocaust denial", "white power", "ethnic cleansing", or "daily stormer"?

I did not recognize "daily stormer". But of course the rest of your picks are among the most recognizable from the list. No good faith here, I guess.

Here's a sample of some terms I did not recognize: "2083: a european declaration of independence", "black sun", "blood and soil", "identity evropa", "red ice tv", or "white sharia" just to name a few.

> there's nothing stopping you from cloning the repo and running your own automated check and analysis.

There isn't enough time in 100 lives to dedicate to discrediting nonsense like this. Even if I put the time and effort into doing this better with no compensation, I don't have a sliver of the publicity they do. HN comments are much more cost effective.

“Black Lives Matter” is blocked, “All Lives Matter” is not. That’s one thing I took from this.

I also took that their implementation of such keywords are poorly implemented - a token implementation at best.

So the one that excludes several races and is the name of an extremist organization is blocked and the other not surprised pikachu