Hacker News new | ask | show | jobs
by animalCrax0rz 2258 days ago
"there is no evidence of cancer" and "there is evidence of no cancer" are two different statements with different meaning, so it's more complex a task than just understanding the importance of "no" in a sentence. It's involves semantic analysis of the sentence. The paper I linked to below describes a technique they call "deep parsing." Check it out for more context.
5 comments

Lexical analysis is surely clever enough to make "no evidence" and "no cancer" the atoms used and should even differentiate "no cancer" from "no, cancer" pretty easily? Are we really still at 80s chatbot level functionality when it comes to string parsing?

Related aside: It frustrates me no end that spellcheck still doesn't appear to use any probablistic considerations, like Markov chains, to determine the intended word. And that when I click the next to last letter to make an adjustment it doesn't then change the suggestions to alternate endings, etc.. Perhaps newer devices than I have do this.

The general problem is much harder than that. You need to understand double negatives, when do they invert the meaning and when do they underline it? "Ain't nobody got time for that" can be interpreted in different ways. And then you need to understand sarcasm. Then a later sentence can invert the meaning of an earlier one. E.g. "This is the best movie ever. Said no one". Using word-pairs as features is easy, but there are just so many exceptions and ambiguity it's a very difficult problem to solve well.
It gets even harder than that: https://en.m.wikipedia.org/wiki/Winograd_Schema_Challenge

For example:

"The city councilmen refused the demonstrators a permit because they advocated violence."

Which party is "they"? There is no lexical information that can possibly answer this question. It depends entirely on an actual understanding of what "city councilmen" and "demonstrators" (in the context of city councilmen and permits!) are, and which one would be more likely to be advocating violence (and in which case that would lead to a permit denial).

Background: Until recently I worked at a symbolic AI company who was tackling this problem. I myself didn't work on this problem directly, but I became 100% convinced that their approach, while a long shot, was the only conceivable way of solving it in a fully generalized way.

Another fun sentence is "I never said she stole my money". It has 7 different meanings depending on which word is stressed/emphasized. Was about to type it all but DDG'd this:

https://www.reddit.com/r/NoStupidQuestions/comments/64ae8h/i...

It gets even worse: "The city councilmen refused the demonstrators a permit because they advocated peace." the meaning of this sentence in 1968 is different than in 2020!
And then there are those who are able to say things that [intentionally] have different meaning to different audience members.
Totally. The general problem is indeed difficult to solve, but I've seen lots of great work in this area.
While it was an interesting point, it doesn't seem completely applicable here. Presumably the classifications include context. "A shirt with no stripes" should be distinguishable from "a stripes with no shirt" in the this context.
This is true, but isn't really that relevant to the parent's point about statistical methods. Statistical methods (and "deep learning" is such a method) could certainly take the order of words in a sentence into account, for example.
Which part of "The paper I linked to below describes a technique they call "deep parsing." Check it out for more context." could you not parse?

I mean if us humans have a difficulty parsing each other's statements, then why should machines do any better?

None. I was merely pointing out that your comment wasn't a response to albertzeyer's point about statistical methods. That is to say, maybe you didn't parse their comment properly ;)
Nope, I was simply countering that it's not as simple as they suggested. I added the link to the paper to show one approach to given them some context as to what might it look like.
The number of humans that would have difficulty parsing "shirt without stripes" is very very low.
I keep trying to talk about the general problem and people keep focusing on this one example. I give up. :)
I tried to search for "cheese without holes" on Google and it yielded good results. I think the problem here is that the query is something people would rarely search.
I just searched google images for "cheese" and "cheese without holes" and I got roughly the same results (about 1/3 of the images had holes in both cases).
"pictures" and "pictures without color" show that it does get some of these, although not the way I expected.
If that's the case that's a big problem, because human children are trivially capable of both formulating and understanding uncommon sentences which still make sense.

It might be hard to come up with examples on the spot, but in everyday life you will routinely come across things you need to refer to by negation which are relatively uncommon.

Human children are also capable of walking, but it’s one of the hardest problems in robotics. Things brains can do intuitively, they can do because they have millions of years of evolution behind them.
Of the things the human brain does, I wouldn't say walking is very interesting.

The most amazing property in my opinion is the fact that it trains itself. (Whereas neural networks are trained by external systems).

I suspect it's related to imprinting. To take the example of filial imprinting, the brain must have some hardcoded notion of what a parent looks like. Then this is used to to build a parent detector, and the hardcoded notion is thereafter discarded. Then the newly learnt parent detector is used in reinforcement learning (near parent = good). Keep in mind that this all happens just after birth or hatching, before the visual centres of the brain have had any chance to train.

Really cool stuff.

THat will be your search bubble. You clearly prefer Stilton or Chedder to Emmental cheese
Yes, sure, exactly. It's a complex task. I'm just saying that there is no reason why a statistical model should not be able to solve the task. And you seem to agree on that. You even linked such a method.
Yes, I was just countering that it’s more involved than recognizing the importance of “no” in a sentence.