Hacker News new | ask | show | jobs
by fighting 3690 days ago
Not in either camp, quit the nlp field due to disillusionment that it would lead to anything useful or meaningful.

Both rule-based and statistical approaches are fundamentally flawed by not incorporating any real world information. Humans do language going from real world info, mapping to grammar or rules. Computers are trying to go the other way and are not going to succeed other than as mere toys.

Even tech progression-wise, both rule and model/nn approaches are really bad since there is no meaningful sense of iterative progress or getting better step by step, unlike cpu chips or memory speeds. They are more of a random search in a vast space, hit or miss, getting lucky or not, which is very bad no good, as a technology or as a career.

2 comments

How about looking at it this way. We now have multiple implementations that get 94%+ success _without_ knowing anything about the world. Isn't that remarkable?

Now to get to 99.4%+ how about we combine techniques such as spaCy or Parsey McParseFace (love the name Google) with very simple real-world cognitive models. So for the example given "Alice drove down the street in her car." a simple cognitive model would _know_ that streets cannot be in cars and so be able to disambiguate. A cognitive model wouldn't know all the facts about the world, it would know certain things about streets, certain things about cars and be able to infer on the fly whether the relationship between streets and cars matches either the first parse possibility or the second. To me this seems like the obvious next step. If it's obvious to me it must have been obvious to someone else so presumably somebody is working on it.

The 94% success rate is in made up, limited tests. Real world, they fail constantly in weird horrific or laughable ways. See any Microsoft AI public demo ever. It's like the self driving car claim of millions of miles without accidents, except that humans took over anytime there was a chance of one.
>> We now have multiple implementations that get 94%+ success _without_ knowing anything about the world. Isn't that remarkable?

That success only lasts in the limited context of the corpora used for training. Step outside that and success goes down to 60% or much worse. And that's just tagging and things, shallow parsing. Meanning? Discourse? Don't even think about it.

I suspect another way of looking at it is that these models actually learn about the real world by reading about it in the WSJ -- of course their knowledge of it is not as deep as our own, but good enough for what they do.

That is, if you took the well NLP trained model, then you could in principle extract out of it facts like "streets are not found inside cars".

That's why I never considered Chomsky's approach to make sense. Purely statistical methods aren't perfect either, but they do include some real-world information implicitly - training sets aren't random, they're taken from human communication.
But those probability models are really far removed from real world/human experience. No human is going to claim a leopard skin sofa could be an actual leopard for example. Not gonna be even a little confused.

There is just a ton of information and context the computer probability models do not have. They can use all the big data they want, but are capturing only a very thin slice of real world information.

> But those probability models are really far removed from real world/human experience. No human is going to claim a leopard skin sofa could be an actual leopard for example. Not gonna be even a little confused.

Mhm.

When humans see a jeopardy answer looking for the name of an ancient king, they might give the wrong name, because quick, did Hadrian rule before or after Caesar?

If Watson gets it wrong, its answer is something like "What are trousers?".

It seems quite obvious that different things are going on there.

The problem with statistical information is data sparsity. You could read all English texts ever written (or spoken for that matter) and the number of meaningful combinations left to see would still be infinite. If you try to learn language only from finite examples, you'll never see enough of it to learn it well. That's why Google reports results against the Penn trrebank. It's not even clear what's a good metric outside of finite corpora (that the field has been overfitting to for decades like someone noted above).
Prior knowledge solves that problem. A human encounters the same sparsity a computer does when learning from text but prior knowledge allows us to connect rare features to a larger model in which they are, in a way, less rare.

If you think about it, there is an iteration happening within machine learning that is essentially building that prior knowledge about the world by reusing previous models as inputs to knew ones. For example how Spacy uses word2vec vectors to do parsing and NER and then sense2vec uses Spacy pos tags create word vectors.

sense2vec.spacy.io

>> Prior knowledge solves that problem.

Prior knowledge _might_ solve that problem. It's not really solved yet so who knows. Yeah, work is ongoing and word vectors sound cool and all, but in the past people said the same thing about bag-of-words models and look where we are now.

Humans solve sparsity, sure, we learn language from ridiculously few data points, but who knows what it is that we do, exactly? If we knew, we wouldn't be discussing this.

Let's restate the problem to make sure we're talking about the same thing: the problem is that the number of possible utterances in a given language that are grammatically correct according to some grammar of that language is infinite (or so big as for it to take longer than our current universe has to live before an utterance is repeated).

And it's a problem because it's impossible to count infinity given only finite time. I don't see how prior knowledge, or anything else, can solve this.

Which must mean humans do something else entirely, and all our efforts that are based on the assumption that you can do some clever search and avoid having to face infinity, are misguided and doomed to fail.