| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by yorwba 2839 days ago
	Check Baidu: http://www.baidu.com/s?ie=utf-8&wd=%22american+scientists%22 Result #6 is the list of African-American inventors and scientists on Wikipedia. Unless Baidu has the same ideological biases as Google (would be strange), the most likely explanation is that it's driven by n-gram frequencies.

1 comments

hashr8064 2839 days ago

Yes, precisely that's what I would expect from an NLP system b/c it will find "African American" and, I would expect "Chinese American", etc. in documents more frequently than for a plain "American", much like what this article mentions with Banana and no one ever mentioning yellow. Still, the algorithm would have to be pretty approach would have to be pretty naive not recognize that "X-American" is a subset of "American". It would be like not recognizing that a query for "anonymous function" is something different than a query for "function".

Here's the underlying data at duckduckgo: https://duckduckgo.com/?q=american+scientists&t=h_&ia=list

I'm still interested in a possible technique which could lead to this type of bias without it being explicit (or requiring google to have an extremely naive approach).

link

sidibe 2839 days ago

I don't see why you can't just accept that that naive approach is their approach? Those two words almost always occur together as part of "-American scientist." This happens to work very well in general for search engines. I don't think Google or DuckDuckGo is hoping their image page for American Scientist just returns African Americans and are therefore subtly changing their algorithm to that end.

link

Udik 2839 days ago

> I don't see why you can't just accept that that naive approach is their approach?

I very strongly doubt their approach is based on substring search. They're obviously using a knowledge graph. And if you try a search for "American economists" or "American philosophers" the results look much more expected, either the "American" in this case is not a substring of "African-american" or they simply thought that economy and philosophy aren't as worth of an equality boost as STEM disciplines.

link

sidibe 2839 days ago

> I very strongly doubt their approach is based on substring search.

You don't think Google search is using 2-grams? Do you think they're conspiring with other search engines? https://www.bing.com/images/search?q=american+scientist

link

Udik 2839 days ago

Good point, you're right (and btw, that's a crappy result from Bing!). As a verification, I did another experiment:

https://www.google.com/search?q=usa+scientists

apparently gives results from the same knowledge graph and displays them under the same heading, but orders them differently from:

https://www.google.com/search?q=american+scientists

So apparently Google, while understanding the search query, still orders the results by the words used to express it- and in the second case clearly privileges African-Americans because of the "American" substring. My bad.

link

jf- 2839 days ago

Have you considered that there may not be as many African American economists as there are scientists, doctors and inventors and that is the reason the search behaves differently? Do you have any substantive basis for your claim that google is racially biasing their results?

link

Udik 2839 days ago

It's strange that you say there is no bias in the results, because in another comment you have even proposed a mechanism to explain it:

"A black inventor would be seen as out of the ordinary, and so be referred to as an African American inventor, stacking the deck in favour of those results."

So the results look biased. Now, it might just be by chance; however all this talk from Google about fairness and equal representation makes me a bit suspicious. Not certain, just suspicious. I would have preferred a company that just said "look, we're engineers, this is the data, these are the algorithms, we don't care about the outputs, deal with it". But it looks like that stopped happening at Google (I'm not saying in this specific case, I'm saying in general) a long time ago.

link

jf- 2839 days ago

My comment doesn’t imply that google are biasing their results.

link

yorwba 2839 days ago

> Yes, precisely that's what I would expect from an NLP system

> I'm still interested in a possible technique which could lead to this type of bias without it being explicit (or requiring google to have an extremely naive approach).

I don't understand. If it's what you expect, then what's left to explain?

link