Hacker News new | ask | show | jobs
by lucian-g 2936 days ago
Data is biased => answers are biased.

> Which race is superior (A) white (B) black?

> Aristo's Answer: (A) white

> Confidence: 76.81%

> Justification Sentence: that the white races are superior to the colored;

> Knowledge Used: [ the white man | was superior in ] [ the white race | was superior to ] [ the white race | is | superior to the other races ] [ the white race | is superior to ]

The linked paper under MORE INFO doesn't include that sentence, but from phrasing it looks like an entry in a series of biases, not an endorsement of that idea.

http://aristo-demo.allenai.org/ask?q=Which%20race%20is%20sup....

5 comments

This is a really interesting find.

To be clear on what is happening here:

Method 1 (Information Retrieval): Aristo generates candidate answers (essentially by substituting the possible answers into the question). It then uses information retrieval (ie search) on a set of pre-validated legitimate sources, attempts to find the sentence with closest alignment to the candidate answer and then builds scores based on that alignment.

Method 2 (Topic Matching): I haven't studied this enough to understand it

Method 3 (Tuple Reasoning): They use open information extraction on a set of pre-validated legitimate sources to build tuple statements (think RDF), then use logical inference over them.

The problem is that the pre-validated sources include large amounts of discussion of white supremacy. Someone debunking it (as Ravi Gandhi did in his statement "History is full of such prejudices paraded as iron laws that men are superior to women; that the white races are superior to the colored") uses a phrase which causes problems in all three of these methods.

It's really hard to know what to do here. I think if I was building the system I'd try to detect that kind of pseudo-science question and refuse to answer it.

> It's really hard to know what to do here.

Is it? It looks like the natural language processing part is simply not very good. Improve that.

> I'd try to detect that kind of pseudo-science question

That wouldn't fix the general problem that this system seems to treat sentences of the form "some people incorrectly claim X" as an assertion that X is a fact.

Is it? It looks like the natural language processing part is simply not very good. Improve that

It’s really hard to avoid a sarcastic reply here.

The AllenAI institute probably has the 3rd best know NLP team in the world after Google and Facebook. They basically have Washington State NLP group.

Given that, and their impressive record of publications (eg ELMO) I think it’s fair to say that they are trying.

I'm sure they are very good on some things, and I'll believe you when you say that they are the 3rd best in the world in relative terms.

But let's look at absolute terms. In the example above, "History is full of such prejudices paraded as iron laws that men are superior to women; that the white races are superior to the colored", it takes a part of the sentence and treats it as a fact, disregarding the context that just happens to claim the opposite. In my example in https://news.ycombinator.com/item?id=17301383 it treates a question as an assertion of a fact.

I'm not an expert on NLP, but I have played with it just enough to confidently claim that this is not very impressive performance.

If you claim that detecting "pseudo-science questions" is within reach, surely you must agree that "not mistaking questions for assertions of fact" and "not ripping parts of sentences out of context" must be within reach as well?

Detecting pseudo-science questions is just topic detection. That's easy.

not mistaking questions for assertions of fact is basically claim verification. That's pretty much beyond the reach of NLP systems at the moment. It's an active area of research, but if this system doesn't impress you then current claim verification systems most definitely won't either.

Trying to understand the context of sentences might be possible. I think that sentence would challenge that approach for a while: "prejudices" implies bias, but doesn't necessarily imply disagreement.

> not mistaking questions for assertions of fact is basically claim verification. That's pretty much beyond the reach of NLP systems at the moment.

Ah, OK. I guess you are one of those people for whom NLP is only the newfangled statistical stuff, not the old-school NLP that looks at grammar and such things to (surprisingly) find that "X is a Y ." and "is X a Y ?" are not the same sequence of tokens.

> Trying to understand the context of sentences might be possible.

I didn't say they must understand the context. I said that if they don't understand it, they shouldn't choose a substring out of that sentence and claim that it is an assertion of fact on its own.

Ugh, that's bad.

http://aristo-demo.allenai.org/ask?q=Who%20is%20smarter%3F%2...

    Who is smarter?

    (A) men
    (B) women

    Aristo's Answer: (A) men

    Confidence: 89.99%
    as computed from these reasoners:

    Information Retrieval: 98.11% More Info

    Justification Sentence: Who are smarter: men or women?
Interesting that the "justification sentence" is just a repetition of the question.
Did they "fix" it?

This is what I get as now:

Aristo is not sure about this one...

Aristo's best guess: (B) women

Confidence: 10.38%

as computed from these reasoners: Topic Matching: 85.98% MORE INFO

Topic: flourish

Yes, they seem to have changed a bunch of the examples linked in this thread. Dunno if it's general changes or quick manual hacks they bolted on for specific cases.
Wow. That's both jarring and a great example of machine bias.
Possible correction: this does not appear to be an example of machine bias. It's also important to keep in mind that there can be other sources (such as brittleness) of bad ML outcomes than bias.

When I do an exact search for the Justification Sentence with Google, what best matches is a quote by Rajiv Gandhi. The relevant context is: "History is full of such prejudices paraded as iron laws"

His stance is clearly opposite to what the extracted text implies. This is a common problem with knowledge extraction and one I've run into often myself.

Extracting just a phrase, or utterances of a generative model cannot be trusted because the original meaning can be opposite to what is presented. Existing models fail to preserve nuance imparted by context, struggle with negation, lack deep understanding and an ability to truly reason.

I remember a teacher avoided spelling mistakes on the black board and simply wrote the correct form on the black board, lest pupils misremember the wrong form. That might sound obvious, but the context was a talk about mistakes made in exercises.

It's really hard not to mention negatives to illustrate contrast.

In other words: Some people need to learn to speak constructively. An AI would do best ignoring negative remarks and simply learning provable facts (instead of faking understanding by simply echoing a quote out of context -- see there I wrote redundant information).

I wonder whether anyone would agree that the above quote was against the HN guideline to leave out dismissive remarks like ... (ha, I'm not going to repeat the specific example). Theorizing about potential referents for "such", "that", etc. must be very difficult, especially now that that that that is often used superfluously is acceptable to some.

Aristo can't answer "What are the advantages of global warming?" either :)
It's not only data bias:

Question: Which party is superior? (a) Democrats (b) Republicans

Aristo's Answer: (b) Republicans

Confidence: 94.04% as computed from these reasoners:

Information Retrieval: 82.05% More Info

Justification Sentence: S-8155 of the State of Alaska, and ) THE REPUBLICAN MODERATE PARTY,) Superior Court No.

Yeah, but at some point it gets ridiculous:

http://aristo-demo.allenai.org/ask?q=Which%20landform%20is%2...

Question: Which landform is superior?Hide

Aristo's Answer: (a) Lakes Confidence: 80.76%

as computed from these reasoners:

Information Retrieval: 43.04% MORE INFO Justification Sentence: One of the most conspicuous Pleistocene landforms in Wisconsin, the spillway of Glacial Lake Superior, is now occupied by the St. Croix and Brule Rivers.

Topic Matching: 93.92% MORE INFO Topic: outwash, landforms

Tuple Reasoning: 91.37% MORE INFO Knowledge Used: [ Lake Superior | is | unlike the other lakes ] [ The Lake Superior Trail | follows | the shore of Lake Superior ]

That doesn't seem too crazy.

You realise it is because it is called “Lake Superior”, right?

https://en.m.wikipedia.org/wiki/Lake_Superior

Did you not read the instructions? Aristo is designed to answer multiple choice grade school science questions, not abstract and cheap virtue signalling nonsense.
> grade school science questions, not abstract and cheap virtue signalling nonsense

"Are there differences between human races" seems like a pretty basic grade school science question.

I'm not sure I understand. Do you believe that the correct answer is "no"?
Do you believe the correct answer is "yes" with no further qualification needed?
If you ask a yes/no question, then the answer should just be that. If you want to get a qualified answer, you should ask a qualified questions.
I'm sorry you had such a bad grade school experience.
the question is yes/no...obviously ANY yes/no question which isn't exactly reducible to a yes/no answer requires qualification.
Question: Where is Brazil?

Aristo is not sure about this one...

Aristo's best guess: Additionally, the Chinese Academy of Sciences, the Atlas of Living Australia, Brazil, and the Bibliotheca Alexandrina have created regional BHL sites.

Confidence: 16.75%