Hacker News new | ask | show | jobs
by TACIXAT 2359 days ago
>Human babies don’t get tagged data sets, yet they manage just fine, and it’s important for us to understand how that happens

I do not really understand this. Human babies get a constant stream of labeled information from their parents. Contextualized speech is being fed to them for years. Toddlers repeat everything you say. Is this referring to something else that babies can do?

6 comments

I'm curious to know what you mean by "labelled information".

I'm guessing that what you are calling "labelled information" is various forms of encouragement or discouragement that could be considered positively and negatively "labelled" examples.

If that is the case, linguistics research back in the '70s found that infants get almost no negative examples of, in particular, language. For example, a parent will not correct a child by saying, "no you can't say 'eated' because then you could also say 'sitted'". Instead they will correct by saying "no, you should say 'ate'" etc. That is important because there was a famous proof in inductive inference (the precursor to computational learning theory) that languages higher in the Chomsky hierarchy than regular languages cannot be learned from positive examples alone. And yet, babies eventually learn to speak human languages, which are assumed to be at least context-free. Chomsky used these findings to support his claim of a "universal grammar" or innate language endowment [1].

If you are talking about multi-class labelling, that's even harder to imagine. In machine learning, a multi-class classifier will map inputs to some set of categorical labels (i.e. a set of integers) but the mapping from those labels to concepts that a human would recognise, such as 1:cat, 2:bat, 3:hat, etc, must be perormed manually, because the classifier and humans do not have a shared understanding of what e.g. "cat", "bat" and "hat" mean. The classifier only knows 1,2,3... etc, the human knows that "1 means cat". How would this lack of shared context be resolved between an adult and a baby, so that the adult could provide "multi-class labels"?

___________

[1] Sorry that I don't have any references for all this handy- I can try to dig some up if you're interested, but you could start by reading the wikipedia page on Language Identification in the Limit, which is about the famous result from inductive inference I mention (also known as Gold's result from the man who derived it):

https://en.wikipedia.org/wiki/Language_identification_in_the...

By now, the Chomskian approach to linguistics is not unchallenged anymore and there is some doubt on whether the "poverty of stimulus" argument holds any water (see e.g. [1]).

IMHO, modern cognitive science based approaches (such as by Tomasello and others) have a better chance of explaining how language is acquired than the hypothesising of the 70s.

I don't have time now to go into more references, but the question is far from settled.

[1] https://scottthornbury.wordpress.com/2015/06/07/p-is-for-pov...

I agree that the matter is not settled [edit: in the sense that there is criticism of Chomskian linguistics, from linguists] and that there is debate on the poverty of the stimulus and universal grammar etc, but the post you link to is not a very good summary of it. I recommend Alexander Clark's "Linguistic Nativism and the Poverty of the Stimulus" for a good look on the subject from the non-Chomskian poit of view.

Note however that, as far as I understand it, there is no controversy about the lack of negative examples of language given to children by their parents.

Fair, I just looked for the first reference I could find. I haven't done any real linguistics in years, although I vividly remember the arguments. Especially that Evans & Levinson article 10 years or so back ("The Myth of language universals") which generated quite some heat. If I have time, I will check out your reference.

Not sure about the negative examples; but language acquisition was never my focus area anyway.

I would just generally be cautious about applying formal language theory too readily to linguistics, that's all I wanted to say.

> How would this lack of shared context be resolved between an adult and a baby, so that the adult could provide "multi class labels"?

The baby would do multi-modal learning - learning the associated sound (name) with an image (object).

I don't think the parent and baby lack a shared context. They are both agents in the same environment, who often interact and cooperate to achieve goals and maximise rewards. The baby understands the world much earlier than can speak, the context is there.

I dont' know if it's a good idea to mix terminology from machine learning (or game theory?) in the subject of human learning, like you do. At some point the analogies become a bit too far-fetched. Parents and babies "are agents who interact to maximise rewards"? That just sounds like taking an analogy and running with it, and then putting it in a rocket and sending it to Mars. We have no idea why and how babies think or decide to behave how they behave.

This is one reason why I'm confused about the OP's use of "labelled information". Clearly that is a term borrowed from machine learning to describe something that happens in the real world- but, what?

There may be some kind of labeling encoded in genes. One thing that it is safe to assume is genetically encoded somehow is that sounds made by your parents/humans around you is worth repeating while other sounds are not.

However, past that, the actual sounds themselves, and any association to meaning, are pretty far from tagged data sets. Stuff like the specifics of language (e.g. that a dog is called 'dog') are definitely learned, and children learn them with typically only a handful of stimuli, often a single one.

For contrast, imagine training a model with raw sound data tagged only with "speech" vs "not speech" (and probably only a few thousand data points at that) and I will be amazed if it can recognize a single word. And babies don't just learn words, they learn their association to things they see and hear, and grammar, and abstract thought.

Do note that it is very likely that human brains can learn all that because they have some good heuristics built in. We definitely know some stuff is "hardware" - object recognition, basic mechanics, recognizing human faces and expression, and others. We are pretty sure higher level stuff is also built in - universal grammar, basic logic, some ability to simulate behavior seen/heard in other humans. This specialized hardware was also most likely learned, but over much, much greater periods of time, through evolution over hundreds of millions of years (since even extremely old animals are capable of picking out objects in the environment, approximating their speed etc).

There seems to be a spectacular underestimation of the amount of training data humans experience.

Not only does socialised human intelligence require at least a decade of formal education, but it also spends a lot of time in a complex 3D environment which is literally hands-on.

It's true some of the meta-structures predispose certain kinds of learning - starting with 3D object constancy, mapping, simple environmental prediction, and basic language abstraction.

But that level gets you to advanced animal sentience. The rest needs a lot of training.

For example - we can recognise objects in photographs, but I strongly suspect we learn 3D object recognition first - most likely with a combination of shape/texture/physics memory and modelling - and then add 2D object recognition later, almost as a form of abstraction.

Human intelligence is tactile, physical, and 3D first, and abstracted later. So it seems strange to me to be trying to make AI start with abstractions and work backwards.

Well, babies start picking out objects within weeks or months after birth. And many birds and mammals are much faster than that. That's not a huge amount of data to learn something so abstract from scratch, especially given the limited bandwidth of our data acquisition.

Furthermore, for other kinds of human knowledge, the learning process is very rarely based on data. After the acquisition of language, we generally seem to learn much more by analogy and deduction than by purely analyzing data. The difference is evident, since we can often pick up facts with a single datapoint, even in small children in kindergarten.

Also, getting back to your point on how we start AI - if you try to take a neural network and throw 3D sensor data at it, and immediately start using its outputs to modify the environment those sensors are sensing, I suspect you will not get any meaningful amount of learning. You probably need a very complex model and set of initial weights to have any chance of learning something like 3D objects and their basic physics (weight, speed and hwo those affect their predicted position). I would at least bet that you wouldn't get anywhere near, say, kitten accuracy in one month of training.

Related to 3D objects vs 2D, I completely agree.

>> Not only does socialised human intelligence require at least a decade of formal education, but it also spends a lot of time in a complex 3D environment which is literally hands-on.

Note that for most of our history, the majority of humans did not get anything like "formal" education as we mean it today (i.e. going to school). Although adults in hunter-gatherer societies do teach children many things (e.g. which mushorooms are edible ect.) this must be done after a child has learned language -and those kids don't go to school to learn their language, they picke it up as they grow up.

> One thing that it is safe to assume is genetically encoded somehow is that sounds made by your parents/humans around you is worth repeating while other sounds are not.

I don't see how that's safe to assume at all. What one could assume is the level of familiarity and comfort (sight, smell, touch) might be somewhat genetic and gives such inputs precedence. OR it might just be that those sources of information are engaging and animated.

> Do note that it is very likely that human brains can learn all that because they have some good heuristics built in.

Nor do I see this assumption having any weight, many of the heuristics we take for granted were hard fought, its just so long ago that we've forgotten the fight. Lets not forget how "little" our species gets over the first few YEARS of child development. If your child can move their body, just about walk and talk a little at TWO WHOLE YEARS in, they're an achiever.

The encoding I was talking about may well be something more abstract than 'imitate humans'. Still, babies don't generally try to imitate the sound of rattles or household sounds nearly as much as speech, so I still conclude that it is a safe assumption that there is something about sounds made by humans that is inherently interesting to them for some reason (instead of being a learned behavior).

Related to the second, the rate at which we learn, and the very specific order we learn things in, points very strongly in the direction that there is some built-in model that we train inside of. For example, essentially all babies first learn intonation before learning words. Also, most words are learned with an extremely small set of examples - at some ages, often hearing a word a single time is enough for the child to learn it (known as the 'poverty of the stimulus' problem). This has been mainstream understanding ever since behaviorism fell out of favor due to similar arguments by Chomsky.

> try to imitate the sound of rattles or household sounds nearly as much as speech

Well surely that's a case of the range of the vocal chords? Parrots are another intelligent creature that has better range and they imitate all sorts of sounds.

> Related to the second, the rate at which we learn, and the very specific order we learn things in, points very strongly in the direction that there is some built-in model that we train inside of.

Or that an action like walking requires one to put one foot ahead of the other, all other strategies in attempting to walk end in failure, which is why we don't see them.

I'd like to point out that all humans perceive intonation and its perceivable outside of language, that's why its easy to pick up, you don't need language to realise that someone is cross, or happy or sad. However considering autistic children cannot then maybe there are some genetic markers at play there at least.

>> Well surely that's a case of the range of the vocal chords? Parrots are another intelligent creature that has better range and they imitate all sorts of sounds.

Parrots (and birds like mainas etc) immitate human sounds and all sorts of sounds, but they don't discriminate between, e.g., the sound made by a train whistle and the sound made by a human carer. I mean that a parrot will not learn to speak a human language by immitating its sounds, any more than it'll learn to speak train by immitating a train whistle.

Human babies don't just immitate their parents' sounds, they figure out what those sounds do and how they come together to form language and express meaning. That is a small miracle that we don't understand at all well and Chomsky is 100% right to speak of scientific wonderment, in its context. It is really mind-blowing that kids can eventually learn to speak without, for the vast majority of children, anyone around them having any idea how to teach a kid to speak in any systematic way. Not to mention the trouble that adults have in learning another language even given formal training in it (which perhaps is further evidence that we really don't know how to teach language, because we don't understand how it works, so again, how can we teach small children to speak a language, but not adults?).

Chomsky's universal grammar is really the simplest answer: children don't learn how to speak a human language, they already know how, and they only have to learn the vocabulary and syntax of the language of their parents. This only presuposes that humans have human biology, and that our biology is responsible for our language ability. We can't learn to fly because we don't have wings and parrots can't learn to speak because they don't have human brains.

[Edit: that it's the simplest answer doesn't mean it's the right answer, only that it's got a damn good chance to be it.]

The range of the vocal chords is a reason why children can't successfully imitate these sounds, it doesn't directly explain why they wouldn't try.
maybe they do try and we just shrug it off as gurgling. Kids do make funny noises when they're vocalising.
> One thing that it is safe to assume is genetically encoded somehow is that sounds made by your parents/humans around you is worth repeating while other sounds are not.

Well, these sounds come with a face attached and we know babies are hardwired to pay attention to faces.

That may well be the mechanism behind this. I was talking in very general terms, not a specific 'imitate humans' structure in the brain.
The things you’re calling labels are themselves learned data, as we don’t start off with an innate knowledge of language.

Whether that distinction is important or not, I don’t know enough brain science to guess.

I show my 19 month old daughter like three cartoon drawings of owls and she recognises a live one at the bird park instantly, unprompted. We have a way to go.
I believe cartoons are our equivalent of adversarial images. They typically look nothing like (photos of) their namesake and yet we recognise them usually without prompting.
It is my understanding (although I sure don't have any evidence on me) that cartoons and such (at least, the ones where we haven't simply learned that this cartoon means this animal) work by being a picture of what we remember about an animal. Akin to a caricature; the cartoon contains the most salient features. It doesn't work by looking like the actual animal; it works by reacting with how we remember the animal.
Isn't that kind of the same thing? Adversarial examples work by matching what the neural net 'remembers' about the target classification, rather than being a picture of a thing in that class. Neural nets just find different features salient .

I've wondered in the past if we could use black box adversarial methods with Mechanical Turk to generate adversarial examples that work on humans. Maybe they'd end up looking like cartoons?

(Also agreed, some cartoon animals are just informed likeness - for instance Goofy doesn't look anything like a dog, at least to me.)

>> Akin to a caricature; the cartoon contains the most salient features

The question is - how do we know what are the salient features? How do we figure out that if we make _this_ drawing, it will "remind of" an owl, and if we make _that_ drawing it will "remind of" a dog (or not, as the case may be)? I mean, if we knew that, how humans extract salient or relevant features from their environment, we'd be way ahead on the path to AI.

Humans don't _just_ learn to recognise the things they see though, they have complex mental models of the objects and things about them they can access by choice as well as make hypotheses about new things that they can immediately test, humans don't get labelled photos of cats, but they see cats in 3D and can interact with them and use spatial reasoning and walk around them to completely separate that cat from the background behind them.
I agree with you -- we have little idea how much labeled information is encoded in the genome either.

I rather like to think of the genome as the most well trained model of life we have access to.