Hacker News new | ask | show | jobs
by pbhjpbhj 2258 days ago
Lexical analysis is surely clever enough to make "no evidence" and "no cancer" the atoms used and should even differentiate "no cancer" from "no, cancer" pretty easily? Are we really still at 80s chatbot level functionality when it comes to string parsing?

Related aside: It frustrates me no end that spellcheck still doesn't appear to use any probablistic considerations, like Markov chains, to determine the intended word. And that when I click the next to last letter to make an adjustment it doesn't then change the suggestions to alternate endings, etc.. Perhaps newer devices than I have do this.

1 comments

The general problem is much harder than that. You need to understand double negatives, when do they invert the meaning and when do they underline it? "Ain't nobody got time for that" can be interpreted in different ways. And then you need to understand sarcasm. Then a later sentence can invert the meaning of an earlier one. E.g. "This is the best movie ever. Said no one". Using word-pairs as features is easy, but there are just so many exceptions and ambiguity it's a very difficult problem to solve well.
It gets even harder than that: https://en.m.wikipedia.org/wiki/Winograd_Schema_Challenge

For example:

"The city councilmen refused the demonstrators a permit because they advocated violence."

Which party is "they"? There is no lexical information that can possibly answer this question. It depends entirely on an actual understanding of what "city councilmen" and "demonstrators" (in the context of city councilmen and permits!) are, and which one would be more likely to be advocating violence (and in which case that would lead to a permit denial).

Background: Until recently I worked at a symbolic AI company who was tackling this problem. I myself didn't work on this problem directly, but I became 100% convinced that their approach, while a long shot, was the only conceivable way of solving it in a fully generalized way.

Another fun sentence is "I never said she stole my money". It has 7 different meanings depending on which word is stressed/emphasized. Was about to type it all but DDG'd this:

https://www.reddit.com/r/NoStupidQuestions/comments/64ae8h/i...

It gets even worse: "The city councilmen refused the demonstrators a permit because they advocated peace." the meaning of this sentence in 1968 is different than in 2020!
And then there are those who are able to say things that [intentionally] have different meaning to different audience members.
Totally. The general problem is indeed difficult to solve, but I've seen lots of great work in this area.