Hacker News new | ask | show | jobs
by feral 3695 days ago
I'd love to hear Chomsky's reaction to this stuff (or someone in his camp on the Chomsky vs. Norvig debate [0]).

My understanding is that Chomsky was against statistical approaches to AI, as being scientifically un-useful - eventual dead ends, which would reach a certain accuracy, and plateau - as opposed to the purer logic/grammar approaches, which reductionistically/generatively decompose things into constituent parts, in some interpretable way, which is hence more scientifically valuable, and composable - easier to build on.

But now we're seeing these very successful blended approaches, where you've got a grammatical search, which is reductionist, and produces an interpretable factoring of the sentence - but its guided by a massive (comparatively uninterpretable) neural net.

It's like AlphaGo - which is still doing search, in a very structured, rule based, reductionist way - but leveraging the more black-box statistical neural network to make the search actually efficient, and qualitatively more useful. Is this an emerging paradigm?

I used to have a lot of sympathy for the Compsky argument, and thought Norvig et al. [the machine learning community] could be accused of talking up a more prosaic 'applied ML' agenda into being more scientifically worthwhile than it actually was.

But I think systems like this are evidence that gradual, incremental, improvement of working statistical systems, can eventually yield more powerful reductionist/logical systems overall. I'd love to hear an opposing perspective from someone in the Chomsky camp, in the context of systems like this. (Which I am hopefully not strawmanning here.)

[0]Norvig's article: http://norvig.com/chomsky.html

6 comments

I don't think you're strawmanning in general, there have been a lot of symbolic AI people who scoffed at any mention of statistics or real-world data, but it's not the case that you have to eschew all empiricism just because you use rules.

See e.g. http://visl.sdu.dk/~eckhard/pdf/TIL2006.pdf which gets 99% on POS and 96% on syntax function assignment – Constraint Grammar parsers are the state of the art of rule-based systems, and the well-developed ones beat statistical systems. CG's are also multitaggers – they don't assume a word has to have only one reading, it might actually be ambiguous, and in that case it shouldn't be further disambiguated (that's why they use F-scores instead of plain "accuracy").

CG's also require manual work, so it's not like you can download a corpus an unsupervisedly learn everything; but on the other hand, for what languages in the world do you have a large enough data set to unsupervisedly learn a good model? And for what training methods can you even get good models from unlabeled data? The set of languages for which there are large annotated corpora (especially treebanks) is even smaller … So CG's are also heavily used for lesser-resourced languages (typically in combination with finite state transducers for morphological analysis), where the lack of training data means it's a lot more cost-effective to write rules (and turn existing dictionaries into machine-readable FST's) than it is to create annotated training data (which would often involve OCR-ing texts, introducing yet another error source). CG writers still tend to have a very empirical mindset – no toy sentences like "put the cone on the block", but continual testing on any real-world text they can get their hands on.

Not in either camp, quit the nlp field due to disillusionment that it would lead to anything useful or meaningful.

Both rule-based and statistical approaches are fundamentally flawed by not incorporating any real world information. Humans do language going from real world info, mapping to grammar or rules. Computers are trying to go the other way and are not going to succeed other than as mere toys.

Even tech progression-wise, both rule and model/nn approaches are really bad since there is no meaningful sense of iterative progress or getting better step by step, unlike cpu chips or memory speeds. They are more of a random search in a vast space, hit or miss, getting lucky or not, which is very bad no good, as a technology or as a career.

How about looking at it this way. We now have multiple implementations that get 94%+ success _without_ knowing anything about the world. Isn't that remarkable?

Now to get to 99.4%+ how about we combine techniques such as spaCy or Parsey McParseFace (love the name Google) with very simple real-world cognitive models. So for the example given "Alice drove down the street in her car." a simple cognitive model would _know_ that streets cannot be in cars and so be able to disambiguate. A cognitive model wouldn't know all the facts about the world, it would know certain things about streets, certain things about cars and be able to infer on the fly whether the relationship between streets and cars matches either the first parse possibility or the second. To me this seems like the obvious next step. If it's obvious to me it must have been obvious to someone else so presumably somebody is working on it.

The 94% success rate is in made up, limited tests. Real world, they fail constantly in weird horrific or laughable ways. See any Microsoft AI public demo ever. It's like the self driving car claim of millions of miles without accidents, except that humans took over anytime there was a chance of one.
>> We now have multiple implementations that get 94%+ success _without_ knowing anything about the world. Isn't that remarkable?

That success only lasts in the limited context of the corpora used for training. Step outside that and success goes down to 60% or much worse. And that's just tagging and things, shallow parsing. Meanning? Discourse? Don't even think about it.

I suspect another way of looking at it is that these models actually learn about the real world by reading about it in the WSJ -- of course their knowledge of it is not as deep as our own, but good enough for what they do.

That is, if you took the well NLP trained model, then you could in principle extract out of it facts like "streets are not found inside cars".

That's why I never considered Chomsky's approach to make sense. Purely statistical methods aren't perfect either, but they do include some real-world information implicitly - training sets aren't random, they're taken from human communication.
But those probability models are really far removed from real world/human experience. No human is going to claim a leopard skin sofa could be an actual leopard for example. Not gonna be even a little confused.

There is just a ton of information and context the computer probability models do not have. They can use all the big data they want, but are capturing only a very thin slice of real world information.

> But those probability models are really far removed from real world/human experience. No human is going to claim a leopard skin sofa could be an actual leopard for example. Not gonna be even a little confused.

Mhm.

When humans see a jeopardy answer looking for the name of an ancient king, they might give the wrong name, because quick, did Hadrian rule before or after Caesar?

If Watson gets it wrong, its answer is something like "What are trousers?".

It seems quite obvious that different things are going on there.

The problem with statistical information is data sparsity. You could read all English texts ever written (or spoken for that matter) and the number of meaningful combinations left to see would still be infinite. If you try to learn language only from finite examples, you'll never see enough of it to learn it well. That's why Google reports results against the Penn trrebank. It's not even clear what's a good metric outside of finite corpora (that the field has been overfitting to for decades like someone noted above).
Prior knowledge solves that problem. A human encounters the same sparsity a computer does when learning from text but prior knowledge allows us to connect rare features to a larger model in which they are, in a way, less rare.

If you think about it, there is an iteration happening within machine learning that is essentially building that prior knowledge about the world by reusing previous models as inputs to knew ones. For example how Spacy uses word2vec vectors to do parsing and NER and then sense2vec uses Spacy pos tags create word vectors.

sense2vec.spacy.io

>> Prior knowledge solves that problem.

Prior knowledge _might_ solve that problem. It's not really solved yet so who knows. Yeah, work is ongoing and word vectors sound cool and all, but in the past people said the same thing about bag-of-words models and look where we are now.

Humans solve sparsity, sure, we learn language from ridiculously few data points, but who knows what it is that we do, exactly? If we knew, we wouldn't be discussing this.

Let's restate the problem to make sure we're talking about the same thing: the problem is that the number of possible utterances in a given language that are grammatically correct according to some grammar of that language is infinite (or so big as for it to take longer than our current universe has to live before an utterance is repeated).

And it's a problem because it's impossible to count infinity given only finite time. I don't see how prior knowledge, or anything else, can solve this.

Which must mean humans do something else entirely, and all our efforts that are based on the assumption that you can do some clever search and avoid having to face infinity, are misguided and doomed to fail.

I largely agree with Chomsky.

I think both approaches are needed for general AI: neural networks, or something like them, for low level perception and recognition; and symbolic AI for higher level reasoning. Without the symbolic layer, you can't be sure what's going on.

Symbolic AI has been very closely guided by cognitive psychology. Artificial neural networks ignore neurophysiology, so even when they work, they tell us very little about how the brain works.

I keep hearing claims that symbolic AI is the wrong approach for anything, and that it failed. Yet there were quite a few successes (expert systems, discovery learning, common sense reasoning, for example) before sources of funding dried up.

Artificial neural networks ignore neurophysiology, so even when they work, they tell us very little about how the brain works.

That is completely wrong. People like Geoff Hinton spend most of their time thinking about how the brain works (indeed, his background is cognitive psychology). The "convolution" part of convolution neural networks is designed to mimic how the optic nerve interfaces with the brain.

I keep hearing claims that symbolic AI is the wrong approach for anything, and that it failed. Yet there were quite a few successes (expert systems, discovery learning, common sense reasoning, for example) before sources of funding dried up.

The funding dried up because they ran into the limits of what is possible.

No equivalent of error backpropagation has ever been found in real neurons, and it's biologically implausible. So ANNs are almost certainly using a different learning mechanism from the one used in the brain. Even single neurons are quite complex and very little of this complexity is present in neural networks.

The visual system (retina, lateral geniculate nucleus, visual cortex) was fairly well understood well before ANNs were developed. A few uncontroversial ideas (e.g. that cells take their inputs from neighbouring cells in the previous layer) were adopted for use in ANNs.

I was around at the time of, and affected by, the AI winter. There was certainly no consensus among those working in AI that they had got as far as they could. Work stopped when funding was cut, often for political reasons.

The most mature area at the time, apparently ripe for commercialization, was expert systems. However, it was very hard to commercialize them: customers couldn't think of any suitable applications, and when they could, they couldn't spare the time of their experts.

Finally, the main reason for the AI winter was probably that AI was unable to live up to the grossly inflated expectations, simply because the expectations were grossly inflated. This seems to be happening again, with neural networks.

> I was around at the time of, and affected by, the AI winter. There was certainly no consensus among those working in AI that they had got as far as they could. Work stopped when funding was cut, often for political reasons.

I wasn't around, but I got curious about symbolic systems after listening to MIT's AI course[1]. Did some reading about the subject. The impression I got matches what you describe.

It's ridiculous how many people here dogmatically recite statements about failures of symbolic systems without (apparently) knowing anything about how those systems were used and what they achieved. If you listen to the comments, it sounds as if research on symbolic systems only ever produced crude, useless toys. That was certainly my impression before I took some time to actually look into it. A bit of straightforward Googling can show that it's a gross misrepresentation of history. For example, MIT's lecture on knowledge engineering [2] has some really interesting info on this subject.

[1] http://ocw.mit.edu/courses/electrical-engineering-and-comput...

[2] http://ocw.mit.edu/courses/electrical-engineering-and-comput...

I've done symbolic AI work. It's great within limits. Deep learning on its own isn't the complete solution either, but statistics and learning are more important than symbolics for achieving breakthrough performance.

I'd invite you to read "The Master Algorithm" to understand exactly how they failed the first time and how they aren't the route forward: https://en.m.wikipedia.org/wiki/The_Master_Algorithm

Hinton, "How the brain does back-propegation": https://youtu.be/kxp7eWZa-2M?t=38m13s
If you want to be convincing, give us links to actual neurology research, no to Hinton "explaining away" objections of actual neuroscientists to his suppositions about human brain by making more suppositions. It's pretty obvious that he made up his mind decades ago and isn't going to be critical of his own theories.
I mean, really, this is quite an argument:

1) You have a working system. You know only bits and pieces of how it works.

2) You build a crude model of the system. It kinds of sucks at doing the stuff the System is doing well.

3) People over several decades apply tons and tons of task-specific optimizations and modifications to your model. Those modifications have nothing to do with the original system, but because of them the model finally achieves good performance at some tasks.

4) You use the hype generated by #3 to claim that you were right all along and that your model captures the essential aspects of the original system.

5) When people point out that your model works in ways that clearly don't match the original system, you make a claim that it's the original system that approximates your model, not the other way around. Without any observations of the original system supporting your claim.

> People like Geoff Hinton spend most of their time thinking about how the brain works (indeed, his background is cognitive psychology).

If that was as significant a factor as you make it sound, the progress in artificial neural networks would be closely tied to the progress of neurology. So where are all the citations of neurology and cognitive psychology papers in recent AI/ANN research?

Because we are so far off being able to simulate biological systems it is easier to do other things.

There is some work in this though, but often going the other way: see for example http://news.discovery.com/tech/robotics/brain-dish-flies-pla... and even more extreme http://www.nature.com/articles/srep11869

Chomsky was never really interested in AI at all. I don't see anything in these results that has any implications for any position that Chomsky's taken. Chomsky's always pretty much taken it for granted that surface constituency structure can be extracted by statistical methods.
I think you are right and I think in the human brain similar sort of hybrid processes happen in order to make sense of the world. In the end strong AI will look very much like a massive hybrid system and a conscious controller that takes and integrates that information into a understood model of the world.
Try writing to Chomsky. He's well known for replying personally to his email. I know that for a fact, he replied to one of mine once (not about language )
Please don't. If you're going to bother the man, ask a more interesting question. He's made his position on this abundantly clear in the Chomsky-Norvig debate.