Hacker News new | ask | show | jobs
by yorwba 2792 days ago
Check Baidu: http://www.baidu.com/s?ie=utf-8&wd=%22american+scientists%22

Result #6 is the list of African-American inventors and scientists on Wikipedia. Unless Baidu has the same ideological biases as Google (would be strange), the most likely explanation is that it's driven by n-gram frequencies.

1 comments

Yes, precisely that's what I would expect from an NLP system b/c it will find "African American" and, I would expect "Chinese American", etc. in documents more frequently than for a plain "American", much like what this article mentions with Banana and no one ever mentioning yellow. Still, the algorithm would have to be pretty approach would have to be pretty naive not recognize that "X-American" is a subset of "American". It would be like not recognizing that a query for "anonymous function" is something different than a query for "function".

Here's the underlying data at duckduckgo: https://duckduckgo.com/?q=american+scientists&t=h_&ia=list

I'm still interested in a possible technique which could lead to this type of bias without it being explicit (or requiring google to have an extremely naive approach).

I don't see why you can't just accept that that naive approach is their approach? Those two words almost always occur together as part of "-American scientist." This happens to work very well in general for search engines. I don't think Google or DuckDuckGo is hoping their image page for American Scientist just returns African Americans and are therefore subtly changing their algorithm to that end.
> I don't see why you can't just accept that that naive approach is their approach?

I very strongly doubt their approach is based on substring search. They're obviously using a knowledge graph. And if you try a search for "American economists" or "American philosophers" the results look much more expected, either the "American" in this case is not a substring of "African-american" or they simply thought that economy and philosophy aren't as worth of an equality boost as STEM disciplines.

> I very strongly doubt their approach is based on substring search.

You don't think Google search is using 2-grams? Do you think they're conspiring with other search engines? https://www.bing.com/images/search?q=american+scientist

Good point, you're right (and btw, that's a crappy result from Bing!). As a verification, I did another experiment:

https://www.google.com/search?q=usa+scientists

apparently gives results from the same knowledge graph and displays them under the same heading, but orders them differently from:

https://www.google.com/search?q=american+scientists

So apparently Google, while understanding the search query, still orders the results by the words used to express it- and in the second case clearly privileges African-Americans because of the "American" substring. My bad.

Have you considered that there may not be as many African American economists as there are scientists, doctors and inventors and that is the reason the search behaves differently? Do you have any substantive basis for your claim that google is racially biasing their results?
It's strange that you say there is no bias in the results, because in another comment you have even proposed a mechanism to explain it:

"A black inventor would be seen as out of the ordinary, and so be referred to as an African American inventor, stacking the deck in favour of those results."

So the results look biased. Now, it might just be by chance; however all this talk from Google about fairness and equal representation makes me a bit suspicious. Not certain, just suspicious. I would have preferred a company that just said "look, we're engineers, this is the data, these are the algorithms, we don't care about the outputs, deal with it". But it looks like that stopped happening at Google (I'm not saying in this specific case, I'm saying in general) a long time ago.

My comment doesn’t imply that google are biasing their results.
> Yes, precisely that's what I would expect from an NLP system

> I'm still interested in a possible technique which could lead to this type of bias without it being explicit (or requiring google to have an extremely naive approach).

I don't understand. If it's what you expect, then what's left to explain?