Hacker News new | ask | show | jobs
by make3 1839 days ago
This intuition is very dangerous and leads to huge misconceptions about deep neural nets.

Neural nets don't learn anything like us, and they don't reproduce our functions. We build on massive amounts of general symbolic knowledge, and can zero shot tasks (without explicit examples) easily.

Neural networks really should be seen as just giant random functions that you progressively modify in tiny ways until they fit your data. As parent says, we've just been lucky or good at constraining these functions in a way that they can only learn useful functions (ie convnets) or that they somehow learn these more quickly

2 comments

Humans certainly do not build on massive amounts of symbolic knowledge because we are absolutely terrible at symbolic knowledge. Reliably reasoning through a basic logical argument is a specialist skill. Even reviewing evidence before making decisions is uncommon, most humans operate on a look -> assess -> do model where the tricky bit is well approximated by a neural net. Which is why neural nets seem to be so good at real-world tasks.

It is completely plausible that when neural nets get scaled up to something approaching human-brain numbers of connections they will well approximate a human brain or be a few tweaks away. Obviously it won't be knowable until state of the art gets there, but there is no reason to think human intelligence is going to be complicated. It is one evolutionary step up from some pretty basic animals.

Maybe you’re talking about a different kind of symbolic knowledge than the OP. To give an example humans can instantly tell whether an arbitrary sentence is grammatical or not which is a deep kind of symbolic reasoning that computers absolutely cannot do right now. And humans can also get the semantic meaning.

Then again math is hard for us. So I think there are nuances.

The fact that computers can't do sentence grammar and meaning right now doesn't tell us anything much about similarities or differences between humans and neural nets. It just tells us that training a neural net purely on a big corpus isn't enough to derive semantic meaning and makes it hard to work out grammatical meaning. No human has ever tried to do that either, everyone comes at text with some real-world experience. So we don't know how well they would do at it. Probably terribly.

It is reasonable to believe that written language is easier to train on a neural net that is trained on both images and words so it can form visual links between words. Maybe that takes more computational grunt than we have at the moment. The failure so far proves nothing.

the argument was about wether humans and neural nets learn in a similar way. I don't see how what you are saying has any impact on that
instantly tell whether an arbitrary sentence is grammatical or not

You do realize we can train a neural network to perform this task? It is a binary classification problem. When I look at a grammatically incorrect sentence I don't do much symbolic reasoning - it just feels "wrong" to me. It does not match any patterns I have in my head for grammatically correct sentences. There's a lot of pattern matching in our thinking process.

What's missing in the current generation of neural networks is efficient information storage and ability to recall that information (e.g. lookup) or update it (direct write).

"You do realize we can train a neural network to perform this task"

I'm doing a master's in deep learning for NLP and I'm not sure we can. Language modelling can't do this because grammatical yet semantically implausible combinations of words yield very low perplexity, like the classic being Noam Chomsky's "Colorless green ideas sleep furiously".

What would be a training set for this? I assume we would first try to do parsing to extract the grammatical role of each word. Then what would be the dataset? A massive attempt at generating the set of all possible trees that are grammatical?

I guess we could use massive textual datasets from reputable sources and extract their grammatical role tree, and learn from that. Generating negative examples with sufficient coverage would be very hard. Strict generative modelling without negative examples with good coverage would see the same problem as with language modelling, where acceptable but unlikely examples would have low perplexity despite being good.

It would seem to me that in order to generate negative examples with good coverage, your would need to have a man made program with a definition of what grammaticality means, which would make making a neural network useless to begin with.

Seems like the experts agree with my take: https://linguistics.stackexchange.com/a/1108

Constructing a training dataset is a separate problem. You could potentially crowdsource enough negative examples. Once you have the dataset, a neural network would most likely be able to learn to classify sentences with a reasonably good accuracy.

Unlike current DL models, humans have a world model (common sense) which is formed through an ability to create/update/lookup explicit rules/facts. Once we figure out how to incorporate that into a learning algorithm and/or a model architecture, AI will become a lot smarter.

If we can train a computer to classify sentences as grammatical or not please let me know where. You’ll save the linguistics department a lot of money as they’ll no longer have to contact native speakers for this research.
Humans require fewer examples to learn language rules. It's not clear that humans use the same learning model a "deep net."
Humans also require a lot of examples to learn a language - years of everyday practice for a young human. Learning algorithms are not the same, but you still need to train a large neural network - lots of neurons with lots of connections (weights) - whether it's in your head or in a datacenter.
There’s some evidence that humans have a Universal Grammar and learn through deletion. And humans can not learn any old language — only a restricted class — meanwhile there’s no reason to think that an ML model would have that problem.

I’d encourage you to read a little more about the topic with an open mind. You might learn something.

Neural nets fundamentally cannot operate the same way a brain does, because they cannot create an abstract representation of a problem, and then gradually and deliberately manipulate that mental model until they develop a solution. They just don't work that way, with current structures. They basically apply a single pass of a very complex function to the data, and spit out a result.

That isn't a problem of scale, it's a problem of architecture. This is one of the reasons Deepmind decided to tackle Starcraft. It's very difficult to solve Starcraft without your AI having some ability to develop and then manipulate a mental model of the game, because that's what you need to construct and unfold original, non-linear strategies.

Neural nets generalise because they have to approximate the data at a lower resolution, it's not that they're constrained to only learn what is useful. They're lossy compressors, but they have a unique property that most lossy compressors don't have. They cannot learn all the properties of the input data - partly because they can't hold that much information - but uniquely because neurons cannot be modified in isolation. A change in one neuron changes the influence of every other neuron in that layer, on the next layer. So it's difficult to learn granular properties of specific examples, because the entire net is affected when you do that (and many granular properties that are learned, will be unlearned in subsequent examples). The deeper the net, the less able earlier layers are to extract granular information from the input. They have to extract very abstract information, and they will gradually converge on an abstraction strategy that works.

That's why residual blocks are interesting. They pass that low-level information to later blocks (which have an easier time processing the granular details) while also leveraging the ability of earlier blocks to extract abstract information. It allows you to extract and combine information at multiple levels of granularity (or abstraction).

Convnets are also invariant to generalisation (e.g. translation, and to some degree scale), which I think is a better definition than "can only learn something useful." They're forced learn information that is more general, which increases the usefulness of each bit, which means you get a higher density of usefulness per FLOP. But you also lose specific information in that process. What if location is meaningful? For example, audio spectrogram analysis can suffer from that property, because specific location on the Y axis is highly meaningful.

What I meant by "forced to learn something useful" is what you put in a more clear way by being forced to generalize.