|
There may be some kind of labeling encoded in genes. One thing that it is safe to assume is genetically encoded somehow is that sounds made by your parents/humans around you is worth repeating while other sounds are not. However, past that, the actual sounds themselves, and any association to meaning, are pretty far from tagged data sets. Stuff like the specifics of language (e.g. that a dog is called 'dog') are definitely learned, and children learn them with typically only a handful of stimuli, often a single one. For contrast, imagine training a model with raw sound data tagged only with "speech" vs "not speech" (and probably only a few thousand data points at that) and I will be amazed if it can recognize a single word. And babies don't just learn words, they learn their association to things they see and hear, and grammar, and abstract thought. Do note that it is very likely that human brains can learn all that because they have some good heuristics built in. We definitely know some stuff is "hardware" - object recognition, basic mechanics, recognizing human faces and expression, and others. We are pretty sure higher level stuff is also built in - universal grammar, basic logic, some ability to simulate behavior seen/heard in other humans. This specialized hardware was also most likely learned, but over much, much greater periods of time, through evolution over hundreds of millions of years (since even extremely old animals are capable of picking out objects in the environment, approximating their speed etc). |
Not only does socialised human intelligence require at least a decade of formal education, but it also spends a lot of time in a complex 3D environment which is literally hands-on.
It's true some of the meta-structures predispose certain kinds of learning - starting with 3D object constancy, mapping, simple environmental prediction, and basic language abstraction.
But that level gets you to advanced animal sentience. The rest needs a lot of training.
For example - we can recognise objects in photographs, but I strongly suspect we learn 3D object recognition first - most likely with a combination of shape/texture/physics memory and modelling - and then add 2D object recognition later, almost as a form of abstraction.
Human intelligence is tactile, physical, and 3D first, and abstracted later. So it seems strange to me to be trying to make AI start with abstractions and work backwards.