Hacker News new | ask | show | jobs
by TeMPOraL 3694 days ago
That's why I never considered Chomsky's approach to make sense. Purely statistical methods aren't perfect either, but they do include some real-world information implicitly - training sets aren't random, they're taken from human communication.
2 comments

But those probability models are really far removed from real world/human experience. No human is going to claim a leopard skin sofa could be an actual leopard for example. Not gonna be even a little confused.

There is just a ton of information and context the computer probability models do not have. They can use all the big data they want, but are capturing only a very thin slice of real world information.

> But those probability models are really far removed from real world/human experience. No human is going to claim a leopard skin sofa could be an actual leopard for example. Not gonna be even a little confused.

Mhm.

When humans see a jeopardy answer looking for the name of an ancient king, they might give the wrong name, because quick, did Hadrian rule before or after Caesar?

If Watson gets it wrong, its answer is something like "What are trousers?".

It seems quite obvious that different things are going on there.

The problem with statistical information is data sparsity. You could read all English texts ever written (or spoken for that matter) and the number of meaningful combinations left to see would still be infinite. If you try to learn language only from finite examples, you'll never see enough of it to learn it well. That's why Google reports results against the Penn trrebank. It's not even clear what's a good metric outside of finite corpora (that the field has been overfitting to for decades like someone noted above).
Prior knowledge solves that problem. A human encounters the same sparsity a computer does when learning from text but prior knowledge allows us to connect rare features to a larger model in which they are, in a way, less rare.

If you think about it, there is an iteration happening within machine learning that is essentially building that prior knowledge about the world by reusing previous models as inputs to knew ones. For example how Spacy uses word2vec vectors to do parsing and NER and then sense2vec uses Spacy pos tags create word vectors.

sense2vec.spacy.io

>> Prior knowledge solves that problem.

Prior knowledge _might_ solve that problem. It's not really solved yet so who knows. Yeah, work is ongoing and word vectors sound cool and all, but in the past people said the same thing about bag-of-words models and look where we are now.

Humans solve sparsity, sure, we learn language from ridiculously few data points, but who knows what it is that we do, exactly? If we knew, we wouldn't be discussing this.

Let's restate the problem to make sure we're talking about the same thing: the problem is that the number of possible utterances in a given language that are grammatically correct according to some grammar of that language is infinite (or so big as for it to take longer than our current universe has to live before an utterance is repeated).

And it's a problem because it's impossible to count infinity given only finite time. I don't see how prior knowledge, or anything else, can solve this.

Which must mean humans do something else entirely, and all our efforts that are based on the assumption that you can do some clever search and avoid having to face infinity, are misguided and doomed to fail.