Most comments here are to the tune of "Well DL is just a bunch of correlations and statistics, it's not really understanding anything"
Ok, well I can also say "humans are just a bunch of chemical reactions and electrical signals."
The beauty of DL is in it's simplicity and really we're at the very starting point of seeing it work with extremely sparse networks (compared to biological intelligence). The fact that it works so well with such limited data in narrow domains should be energizing.
I really do want to ask the author that question, given that he focuses so much on the "weird" idea that everything turns out to be just numbers moving around over time.
"What do you suppose the input to the human brain looks like ?"
Since I have kids I have come to realize that the same thing you see in neural nets you see in human beings. Understanding exists, but it is mostly not how human beings respond to the world around them. Mostly we are a minimally generalized dictionary, we know a long long very damn long list of "tricks". If A happens, B will follow. There's very little along the lines of "objects fall along a parabolic trajectory".
This leads to generalization errors, and the surprising thing is you see those in humans ! Kids having learned to open one type of door do not know how to deal with an (even very slightly) jammed doorknob, they don't recognize differently shaped doorknobs as doorknobs, etc. First few days they don't even realize that if pushing won't work, pulling might. So the understanding of opening doors really does start out on the level of "move the free end of the small cylindrical object in the middle of the door that's parallel to the floor down, and then push", and if any of those conditions fails, well, door's going to stay shut.
And this is exactly the very hard problem you encounter with neural nets : finding the right balance between specificity and generalization. But one saving grace is that if you specialize in enough special cases, you can get around without having a general understanding, and that's exactly what's happening with kids.
Babbage is said to have owned a dancing automaton he called "The Silver Lady" that was delightfully lifelike in its movements. I wouldn't say that such a device "understood" dance, no matter how perfectly it moved.
Given today's technology and sufficient time, you could devise an AI that could watch dance videos, "understand" dance, and create its own Silver Lady.
You're arguing a strawman. I never claimed that understanding was based on a phenomenological evaluation of an output. Rather, reductionism is not an argument against complexity.
You write about a robot that, from a phenomenology perspective, appeared to embody the understanding of dance required because it was moving in a certain graceful way of dancing. Then you imply that this system was certainly unaware of the complexity of dance, irrespective of how graceful it could be. That is, it didn't "understand" dance in a more abstract way.
I would not argue otherwise, nor was this line of reasoning in question.
So you aren't arguing against my point, that reducing the argument of understanding to: "Well X is just [a, b, c]" is a bad argument. Instead you argue against the unstated claim that "X systems that look like they embody Y actually have an understanding of Y," in the sense that a human would "understand" Y. That is a strawman and not what the something I am claiming.
Okay, I get that I was arguing past you, not to your point.
> reducing the argument of understanding to: "Well X is just [a, b, c]" is a bad argument
I think when you say, "humans are just a bunch of chemical reactions and electrical signals" you're stating a hypothesis, not a fact. But we know DL et. al. is just mathematical machinery.
It may turn out that consciousness is somehow the result of mechanics (I do not believe it, but that's beside the point) or it may turn out that what we are, the "thing" that understands, is somehow beyond mechanical systems. I feel like I should clarify that when I say "understanding" I mean more than that there is some mechanism that can perform a complex mapping from inputs to outputs (example: chess playing AI doesn't "understand" chess in the sense I mean.) There is some "self" that understands, and this is directly tied to conscious subjective awareness.
In one of your other comments on the same article you say:
> As to the question of consciousness, it is yet to be well defined, with no possibility to test (because of eg Qualia) so by definition you'd never verify or not. At most you'd recognize what you perceive as consciousness based on how you perceive other entities which you believe have it.
Consciousness cannot be defined, as you say, because it has no qualities. And it cannot be scientifically studied for the same reason. However, there is a method to "detect" it in other systems, to wit: merging. Two or more conscious systems can voluntarily merge, creating a new conscious entity partaking of but greater than its members. This isn't widely discussed or even known in AI and consciousness debates, so I wanted to mention it.
The mysticism around ‘Emergence’ is just a modeling error where people only abstract in one way (say, down to cells) and don’t include something important like the interaction between cells in their reductionist model of the system. It’s like creating a graph without the edges. And so when those effects have manifest consequences at a higher level, it feels like they appeared as if by magic.
IT's kind of like saying, "This river is not the same river that it was upstream, and yet it is. The river is not the same water of last year, and yet it is the same river."
The phenomenon is entirely well understood by all involved, and yet coming up with a reasonable definition is hard. http://existentialcomics.com/comic/164 So it's easier to be mystical.
I think author is stretching arguments here a bit - DL is just partitioning space according to some pre-baked associations given to it during training; in this case it's more like a non-linear optimization where we want to end up with N-million dimensional objects of certain shape obtained by optimizing some objective function allowing predicting similar associations. It doesn't have much with the actual innate quality of understanding. Maybe reinforcement learning with deep learning together (DRL) can move us towards such a quality at least in a mechanical sense.
That's got to be one of the most concise but still complete descriptions of deep learning that I've seen so far.
The question that it implicitly raises (at least, with me) is how can we tell the difference between 'understanding' and 'deep learning' if the end results are the same?
To me 'reasoning' is a slow, conscious process, and understanding is a part of that. But classification problems , especially when done by humans when they try to work fast have no room for such conscious decision making, we go much faster than that and outsource the job to our subconscious. Predictably, the error rate goes up and in those kind of situations deep learning can today already outperform humans on the same tasks.
The weird thing is that deep learning solutions can get simple cases completely wrong, where a human would never err, and yet get some of the hardest cases - where a human would be very likely to make an error - right. It's baffling.
On one hand, I sort of agree with you. On the other hand, what you are saying feels a little bit like saying that humans aren't impressive, because we are just atoms.
Sometimes interesting things arise from many small, simple parts.
> DL is just partitioning space according to some pre-baked associations given to it during training
What I wonder is whether that's not also maybe the cornerstone of human understanding. If I understand correctly, you are essentially saying that DL is forming categories, or developing a classification scheme. Granted, if we're only talking about supervised DL, and the program is practically told where to form the boundaries—then it's not very impressive. But if the software is extracting statistically prominent commonalities and using those to form category boundaries, and arranging them hierarchically—then while the implementation may be totally different from human understanding, the effect seems to strongly overlap.
I assume I'm probably just missing something—anyone know what it is? (It seems clear that at least part of the problem here is that 'human understanding' has been left far more vague than DL, and in order to say one way or the other how much they have in common, we need to better define 'human understanding'.)
I think what DL is capable of is already very impressive; sometimes while playing with DL in NLP I am wondering if our languages aren't way simpler than we think/hope and we aren't that intelligent either.
There is one massive difference between humans and DL; humans can mimic something just from a single (even partial) observation; DL requires huge amounts of data and massive parallel processing, something that we got only recently with Big Data and GPUs.
DL also suffers from the curse of dimensionality; in theory deep fully connected networks should be able to do everything better, in practice they are awful and only cleverly constructed schemes like CNNs, LSTMs etc. that assume data to be in certain format/domain bring impressive results when paired with optimizers/metrics that magically work on a given dataset. If you are able to construct a DNN that can figure out PCA/ICA/eigenvalues/etc. on its own during its training like it does with convolutions, that would again enable another set of magic tricks. In any case, humans still have to figure out the architecture of the network that works (even if we now have AutoML for figuring out best hyperparameters in parallel).
Then the hard problem of consciousness; I personally believe we are far off and probably miss something very important in our understanding of Universe.
> humans can mimic something just from a single (even partial) observation;
Human learning is similarly based on large numbers of exposures; we don't form categories from single exposures. There are times where it looks that way, but what's happening instead is we come across a new particular instance of an already known general category (e.g. I have the general category 'cat', and I know they can vary on the dimensions, color, size, unruliness, etc.—and I've never seen a purple cat before, but I understand it after one exposure because it's just another value for an already known attribute. The less obvious examples are just super abstract, but have the same basic relationships in place).
> Then the hard problem of consciousness
That, by definition, doesn't have to do with any states of physical matter, nor any kind of computation. It's asking about the subjectivity of state transitions. So it should not be involved in considering a functional equivalent for (important subsets of) human brain behavior.
Edit: to clarify about the 'hard problem' relation to this: if you take Searle's 'Chinese Room' critique, for the question of functional equivalence it doesn't matter whether the person in the room understands Chinese or not; it just matters that at the end of the day the correct cards are held up.
> Then the hard problem of consciousness; I personally believe we are far off and probably miss something very important in our understanding of Universe.
Exactly, and it's the fact that consciousness is outside of grasp of science (see my other comment in this thread).
I don't see why "understanding" is equivalent to mere pattern recognition. Even using this word "recognition", what does that mean? It's another word like "understand". These algorithms are just pattern patterning. They don't even know they are patterning, that is a meta-property assigned in (or by) a context.
I agree, but I think human knowledge can be represented as either a graph, hierarchy or network of patterns. Like my knowledge of the letter 'A' is a network of connections to patterns 'language', 'english', 'alphabet', whatever else, and if the computer can do the same, it can use that knowledge of that network (as a whole separate entity) to make a decision, so to speak.
Consciousness does come into it since we have a pretty visceral sense of it, and especially when we mentally trawl through our patterns to make some story, but really understanding should just be creating new patterns from existing patterns and the ability to utilize them as distinct entities in some way (rather than being emergent in the system implicitly and only being utilized by accident, say as emergent behavior randomly occurring because of local constraints)
I don't see why "understanding" is equivalent to mere pattern recognition.
You're underestimating what goes into high accuracy pattern recognition as well as assuming that patterns exist for only one vector and in a single context.
If I asked you to explain how you "understand" some concept, it will inevitably be how the structure and mechanics of it relate to others and in what context. All of those are simply patterns that are abstracted or made more granular.
For example, how do you "understand" what a car is? You would inevitably describe some definition of a car mechanically and the context in which a car operates. So it's a contained combination of metal and plastic objects and usually liquids with a mechanism to transfer power through gearing and wheels, a compartment for humans, some control mechanisms etc... (definition of the technical), but it can't operate in water (boat) or in the air (airplane).
Each of these things is learned through exposure over time, and recognized as connected, to come up with a "understanding" of a car even before it's formally defined. This is why children ask if cars can fly or go in the water.
He's making the age old mistake of conflating mapping input and outputs to intelligence.
Intelligence is not defined by the ability to recognize letters. Or play a game of Go.
Deep learning is a powerful tool for creating systems that have an ability to map inputs to outputs with very noisy, non-linear or complex data.
The mapping itself may be complex, but it's not going about solving problems like a person would. It has no idea what letters are, and how they fit into its world. It has no concept of self, cannot contemplate its own existence -- and perhaps most important of all, has no free will.
The moment we have some kind of deep learning or AI that has free will and can express interest in something other than what it has been trained on, I would say we are closer to unraveling the mystery of consicenesss and human intellect.
Even babies are animals exhibit many forms of free will, decision making, and novel behavior that cannot be explained with our current observations of route deep learning techniques.
You are just thinking of very simple scenarios of supervised learning. Even the simplest of other examples like playing games can be thought as decision making(or free will?). Also then there are areas where deep learning research is heading, e.g. neural turing machines. It has just arrived and does not works great, but if the concept will be successful, it can be thought as free will by all definitions.
Isn't free will a non-deterministic thing? NTM, DNC etc. are very promising but at the end they are programs runned by a Turing Machine.
Are free will/consciousness computable? this is the real question IMO.
Whoo boy not sure that's a good rabbit hole to go down. If you're unfamiliar with compatibilism I'd suggest you check it out. I think hard determinism gives the most reasonable answer here with a resounding no.
As to the question of consciousness, it is yet to be well defined, with no possiblity to test (because of eg Qualia) so by definition you'd never verify or not. At most you'd recognize what you perceive as consciousness based on how you perceive other entities which you believe have it.
This is all a bunch of sentimental bullshit. I don't know why people pursue this doomed line of reasoning. The problem is that you can't distinguish free will from not free will in any meaningful way.
A much more fruitful challenge to the aliveness of computers is to ask a singulatarian to show us any deep neural net that can fold proteins in constant time like physical reality can instead of exponential time like a computer algorithm can. Then I will believe that computers are alive and mind uploading is possible.
I really like your argument about protein folding. It's another interest area of reversible computing. That is, we will never be able to create a more powerful computational structure that is itself an implementation of the state of matter itself. It will always be less efficient and require more space or energy to represent than nature can do. That is unless we can find methods or create artificial substances or circumstances that would otherwise be impossible or improbable for nature to create. Man made substances come to mind, that have structures and mechanical properties that far exceed anything found naturally.
I'm no expert on protein folding but isn't there a more mundane explanation? There's plenty of physical processes that a computer can't simulate because reality is massively parallel to a degree that a computer isn't. I'm not sure that would explain the constant vs exponential discrepancy but I'm not sure I see the connection between that and "aliveness".
Or maybe I do. I'm seeing a glimmer but I don't want to put words in your mouth.
Consciousness is likely just a whole bunch of computation.
I suspect "What is consciousness?" will go the way of "What is life?". We more or less understand things that make up a bacteria. Those components aren't alive although the bacteria is. So, it's just a matter of definition.
Consciousness is misunderstood by surprisingly large number of smart people. The common view is that there's science and that's it, when actually, science just describes the patterns of what we observe via consciousness, which is in a way above science.
Regarding "what is life?", that's fundamentally different. Life can have fairly concrete definitions. Basically, it's a physical matter with specific properties, that's it. Whereas with consciousness, it's much more complicated. But defining, say, the feeling of pain as a physical matter with specific properties doesn't make much sense. "Pain is when these neurons are charged."
Also, what is a computation? A falling rock does perform a computation of a physical process. Any physical system can be said to perform a computation - or even a myriad of different computations, depending on how the physical state is interpreted.
That means you can create consciousness by simulating a Turing machine with pen and paper, or by positioning sand systematically. You can encode its memory in different ways by giving different meaning to different positioning of sand. So randomly throwing sand around could create a Turing computation of consciousness (and all kinds of feelings) with the right choice of encoding.
> Computers understand things as well as us, perhaps better.
If this was limited to chess, I would unquestionably agree.
If it was limited to image recognition, I would tentatively agree, although things like [0] make me cautious (admittedly, that was from March, and I'm not familiar with progress since then).
However, the author seems to be generalizing beyond those two domains, to the limits of human understanding. That seems like a couple-orders-of-magnitude leap too far to me. For example, I don't know of any autonomous system capable of understanding a short novel with simple language and writing a one-page summary of it, as might be expected of a human ten-year-old.
To what extent those universal perturbations are causing problems due to insufficient image augmentation? Or due to deficient optimizer used while training CNNs (all optimizers are just heuristics with nasty failure cases)? Could we train a GAN-like DNN on those perturbations to make their effect disappear?
Perhaps your reference [0] would work on the human brain too, if only we could know all the weights assigned to all neurons/axons of the given human this should apply to :)
> Now I find it hard to hold on to the belief that I understand what is "A" and what is "B", while computer can only compute.
Humans being surprised by the computer should not be the yardstick for AI. A trained neural net can recognize the letter "A" and differentiate it from things that are not "A" but it does not know that "A" is part of the Latin alphabet and that there are other alphabets that form written human languages.
The day the computer spontaneously invents a new and usable alphabet without having been specifically designed to do so is the day I will concede we have hard AI. We have a long way to go. Until then it's just a bunch of hotdog/not hotdog classifiers.
> The day the computer spontaneously invents a new and usable alphabet without having been specifically designed to do so is the day I will concede we have hard AI.
Most humans have not spontaneously invented new usable alphabets, so I suppose that means most humans haven't meet the bar for true intelligence either.
I still don't understand this obsession for trying to define "hard AI" or "true intelligence" in binary terms. Intelligence is a spectrum, and deep learning has advanced it forward, thus making machines more intelligent -- yes, we can use that word 'intelligent' for computers just as we do for biological machines. Don't freak out.
Is it really so hard to accept that intelligence isn't all-or-nothing?
Intelligence is a spectrum, although creativity is trickier to define. I would argue that most children grow up inventing their own methods of expression (usually pictorial), until they learn the existing mones well enough to communicate satisfyingly enough. The capacity and drive to create, perceive, and extend is innate.
but it does not know that "A" is part of the Latin alphabet and that there are other alphabets that form written human languages.
There is nothing preventing the computer for learning those connections however, so all you are doing is moving the abstraction layer. It's not a fundamental break point.
> The day the computer spontaneously invents a new and usable alphabet without having been specifically designed to do so is the day I will concede we have hard AI. We have a long way to go. Until then it's just a bunch of hotdog/not hotdog classifiers.
I have subsystems in my brain processing he letter A that also do not know that it's part of the Latin Alphabet. It's a start but I agree we have a long long way to go to hard AI and I'd be surprised if I see it in my lifetime.
To me the recognizing of the letter is a whole lot more impressive than knowing facts about it (If we're talking about OCR).
In your example a computer could very easily learn those simple facts, just like you did. It's nothing to tell a program how to classify a piece of data. You didn't use some crazy learning when you attributed A to being part of a Latin alphabet, someone just told you that fact and you saved it and classified A into the Latin alphabet.
I have always believed that understanding is an emergent property of physical processes that could be modeled computationally, but I do not think deep learning has yet demonstrated that it has yet achieved it. Some of the evidence comes from the ways it fails, such as 'recognizing' images that humans would understand are not what the systems think they are, and being confident in decisions that make no sense. These situations occur precisely because of a lack of understanding. I am open to the possibility that deep learning alone might achieve understanding, but I think it is more likely to succumb to the law of diminishing returns before it gets there.
Actually, computers are conscious as well. Consciousness is simply a system of information that operates on a continuous sense/plan/act loop. You could argue that they are "less" conscious, but to say that they are unconscious is to make the same mistake as people have made for years by saying that computers cannot "understand" anything.
Some people push back on this by saying computers have no sense of self. Thats not true. Most computers do have internal state representations about themselves. Take a driverless car for example. When it does localization, it's constantly referencing its own shape and speed and comparing it to the environment. That's a sense of self.
Whatever philosophical barriers we place between ourselves and machines (and animals/nature for that matter), one thing is for certain: they will eventually debunked.
This is what's often referred to as the easy problem of consciousness—there is also a 'hard problem' (https://en.wikipedia.org/wiki/Hard_problem_of_consciousness). The tricky thing is people often just use 'consciousness' to refer to either, so most discussions of the subject are talking about completely different things.
Thats interesting. Thanks for sharing - I'd never come across that term before. Lack of a common definition for consciousness definitely plagues the discussions around it.
In this case though, it is my opinion that there is no definite distinction between the two "types" of consciousness you are referring to. In my opinion, all consciousness exists on one vast spectrum. The distinctions between types are just constructs of human thought that were erected to preserve our sense of self and specialness as people.
Here's another way of thinking about it: the non-hard problem of consciousness (I think it's unfair to call it the 'easy' problem—though it's definitely easier) deals with the behavior and structure of a system which has some physical incarnation (whether in circuits or neurons or whatever); the hard problem on the other hand is about the subjectivity involved in being the system—which is no longer a question about the behavior/structure of systems.
I think the difference in type of question is like: if our universe is likened to a board game with some finite set of rules, e.g. Monopoly, the 'physics' of this universe is fully determined by the rules of the game (even if there are non-deterministic aspects where you have to e.g. roll dice). The non-hard problem of consciousness is a question in this realm, like "can I sell one of my properties to another player?"; the hard problem is necessarily outside the scope of the rules; it's a question more like, "what is the molecular composition of a 'Chance' card?". Unfortunately, the question is being asked inside of the game and all that's available in attempt to answer it are the constructs and rules from the game.
It's an interesting thing to consider because in some ways what it discusses is what we have the most direct empirical access to—and yet it's also one of the most clearly unapproachable topics which we couldn't usefully say anything definite about (except that its inaccessibility is interesting)
Invoking "that's just a construct of human thought" gets you into philosophically dicey territory, especially if you're using it to dismiss/reduce aspects of human thought. The danger is you are implicitly invoking A to prove not-A.
Not exactly. I'm not invoking A, I'm ignoring it on the grounds that it doesn't reflect reality. I don't see any solid, logical grounding to the claim that there are discreet types of consciousness.
Your last claim may be true, but your assertions don't constitute any form of debunking. We don't have an objectively agreed upon way to measure consciousness (though some have been proposed) so making bold claims like "computers are conscious as well" doesn't make much sense until we agree on a way to measure and experiment on the presence of consciousness.
Society has assumed the defacto circular definition of consciousness as "whatever we, as humans, are experiencing". For obvious reasons, this is not a helpful concept.
Instead, I like to think of consciousness in terms of structures and mechanisms of information flow. If we open our minds up to this type of thinking, we can see consciousness in varying degrees in nature, in computers, and of course in people. For anyone who's curious, the guiding light in this school of thought is Hofstadter's Godel Escher Bach.
I upvoted your original comment above even though I don't agree because you're commenting in good faith in my opinion.
Society as a whole has not assumed a concrete definition for consciousness, as there are a lot of people who don't give it a second thought. Among those who do recognize consciousness as such, the word "consciousness" is as close as you can get to being able to indicate the phenomenon. So I can't agree that it's not a helpful concept.
But in fact, the "thing" that is the referent of that term is not a concept at all. It is the pre-conceptual basis or arena for concepts that arise in it. The TV set is not a TV show.
Humans obviously vary in how conscious they are, both from one to another as well as individually over time so it cannot be properly defined circularly as you say above.
All "structures and mechanisms of information flow" are contents of consciousness not components. Awareness has no qualities, no form, no sides nor parts, it does not experience time: it is always "now", and it is always "here". What you are seeing "in varying degrees in nature, in computers, and of course in people" is mind I think. Cf. Gregory Bateson, "Mind and Nature: A Necessary Unity" and "Steps to an Ecology of Mind"
Lastly, "Godel Escher Bach" is an excellent book and was instrumental in my own process of coming to grips with consciousness. To wit: I think the closest we can come to modelling or describing consciousness mathematically is as a strange loop involving the entire Universe though-out all time.
The Chinese Room Argument is deeply flawed because it assumes that language translation in humans is a conscious phenomenon. In fact, if you're proficient in a foreign language, you can relate to the fact that for the most part translation happens in the black box of the subconscious mind. The words "bubble out" naturally. The black box of the subconscious mind is no different than the black box of the Chinese room. "Understanding" in the traditional sense is absent from both processes.
If a human can translate perfectly without understanding the conversation, then that to me implies that the mind itself gives no innate intelligence similar to the computer. It must be taught the meaning of things, exactly as a computer would need to be. I'm just not following his logic, it feels like a straw man. Of course the computer doesn't understand the meaning of the symbols it is translating, because it was never given data to teach it that (similar to a human in the scenario).
Perhaps i'm arguing semantics and this is what the author means but... in your primitive mind, you are able to recognise something even if you have no idea what it is, you can learn to recognise.
The ability to introspect and analyse what makes that thing unique or understand what it's purpose or origin is has everything to do with being sentient.
We might not know what exactly being sentient is but recognising an image is like lobotomising the brain to just be a visual cortex, it can match but the other networks that work in the abstract are not there.
> Given enough examples, computers can understand what is letter "A" and what is letter "B".
Meh.
Given enough examples, computers now can distinguish letter A and B but distinguishing is not understanding. You could argue that after learning the Network just uses an instruction set and from the outside that may leave the impression of understanding but it really does not. Isn't that basically the Chinese room thing?
In fact recent research indicates that you can randomly relabel the training examples and the network still achieves zero training error (https://arxiv.org/abs/1611.03530). So it is not "understanding" anything intrinsic or fundamental about the letter "A". Rather, it's just storing training examples somewhere inside of its millions of parameters, which sounds a lot less impressive.
That is not a conclusion that can be drawn from the findings in the paper. While the models they evaluate can achieve zero training error on random labels, the test error is obviously not zero: it doesn't generalize at all. However, training on real labels often finds solutions which can generalize quite well.
A better way to summarize the central question of this paper would be: "Why is it that a large-parameter model trained with gradient descent on real data _could_ just memorize all of the training data (it has the capacity) yet finds solutions which generalize well to an unseen test set?"
To say that deep learning is _just_ memorizing its training data would be incorrect. We have empirical evidence to the contrary and this paper is part of that evidence.
But we also have empirical evidence that they generalize incredible poorly, namely the existence of imperceptible (adversarial) perturbations which can transfer across images and networks and are catastrophically misclassified.
Adversarial examples don't really support the claim that deep models are just memorizing examples. If they were, they wouldn't generalize to unseen examples at all. However, the human brain is also susceptible to adversarial examples (e.g. optical illusions). Yet human brains still generalize quite well. Likewise, deep learning can both suffer from adversarial examples and generalize well.
Generalization is a multi-axis scale, not a switch: you can have more or less generalization in many different dimensions. Being terrible at adversarial examples just means that axis is weak.
Generalization error is a number. You can debate the probability space over which it should be computed, but it exists in one dimension. Otherwise statements like "generalize well" make no sense. Anyways, I'm not aware of any optical illusion that can make the brain confuse a house cat with a door knob. Yet it is apparently possible to make any one of a number of neural nets do so, using the same technique and with only subtle, imperceptible changes to the input. So they cannot possibly be learning any sort of intrinsic representation of these objects. I'm guilty of being facetious in using the word "just", since of course deep nets (and all other serious ML algorithms I know of) are able to generalize to an extent. What's not clear to me this represents some paradigm shift in AI, as is often claimed, or if it's simply the consequence of fitting a hugely overparameterized function approximator to a web-scale amount of training data.
>"Specifically, we take a candidate architecture and train it both on the true data and on a copy of the data in which the true labels were replaced by random labels. In the second case, there is no longer any relationship between the instances and the class labels. As a result, learning is impossible."
This is like saying learning someones phone number is impossible because there is no relationship between the person and the number.
It's more like giving you a bag of random house numbers and instructing you to place the numbers on the correct houses in an area you've never been to before. An instructor teaches you where some of the numbers go, and you can memorize those examples, but when the instructor leaves you to finish the job on your own you have no way of knowing how to assign the remaining numbers.
Memorization is pretty easy. Generalizing from past examples requires that there be a relationship not just between one person and their phone number but between all people and their phone numbers.
Ok, well I can also say "humans are just a bunch of chemical reactions and electrical signals."
The beauty of DL is in it's simplicity and really we're at the very starting point of seeing it work with extremely sparse networks (compared to biological intelligence). The fact that it works so well with such limited data in narrow domains should be energizing.