Hacker News new | ask | show | jobs
by hashr8064 2804 days ago
I think google should take a bit of its own advice here. Google "american scientists", "american mathematicians" and tell me how these are not hugely biased results. I work in NLP and there is no way you get this without forcing your data/algorithms to return these types of results.

This isn't a normative statement, just descriptive. Whether or not google should bias its results is a completely different discussion.

2 comments

1) I'm trying to figure out what NLP has to do with this problem. This is a classic collaborative filtering "problem."

2) I think Google is acutely aware that their results are driven by human behavior and thus are biased. It's the nature of its design

Can you explain how this is collaborative filtering as opposed to a classic IR ranking problem? CF would suggest they are somehow getting user ratings of these scientists, but either way its going to boil down to a similarity metric basically. So I guess for me, I can't imagine how user data is creating these rankings and I'm pretty confident using IR techniques on the datasets they have would not return these either, ergo, they are likely tweaking the factors themselves to return results that are "less biased" i.e. less representative of the underlying distribution and more normally distributed aka politically correct.

But If you have a better theory of how the 10 of the first 20 "american scientists" are black and 5 are women, I'd be interested to hear it.

Check Baidu: http://www.baidu.com/s?ie=utf-8&wd=%22american+scientists%22

Result #6 is the list of African-American inventors and scientists on Wikipedia. Unless Baidu has the same ideological biases as Google (would be strange), the most likely explanation is that it's driven by n-gram frequencies.

Yes, precisely that's what I would expect from an NLP system b/c it will find "African American" and, I would expect "Chinese American", etc. in documents more frequently than for a plain "American", much like what this article mentions with Banana and no one ever mentioning yellow. Still, the algorithm would have to be pretty approach would have to be pretty naive not recognize that "X-American" is a subset of "American". It would be like not recognizing that a query for "anonymous function" is something different than a query for "function".

Here's the underlying data at duckduckgo: https://duckduckgo.com/?q=american+scientists&t=h_&ia=list

I'm still interested in a possible technique which could lead to this type of bias without it being explicit (or requiring google to have an extremely naive approach).

I don't see why you can't just accept that that naive approach is their approach? Those two words almost always occur together as part of "-American scientist." This happens to work very well in general for search engines. I don't think Google or DuckDuckGo is hoping their image page for American Scientist just returns African Americans and are therefore subtly changing their algorithm to that end.
> I don't see why you can't just accept that that naive approach is their approach?

I very strongly doubt their approach is based on substring search. They're obviously using a knowledge graph. And if you try a search for "American economists" or "American philosophers" the results look much more expected, either the "American" in this case is not a substring of "African-american" or they simply thought that economy and philosophy aren't as worth of an equality boost as STEM disciplines.

> Yes, precisely that's what I would expect from an NLP system

> I'm still interested in a possible technique which could lead to this type of bias without it being explicit (or requiring google to have an extremely naive approach).

I don't understand. If it's what you expect, then what's left to explain?

As someone else has pointed out, it's because "American scientists" is a substring of "African American scientists." And it's easy to check that's the case. Search for "African American scientists" and those first results are the same. Makes sense: exact match vs inferred connection. Then go search for "English scientists", "French scientists", "German scientists."