| There seems to be a spectacular underestimation of the amount of training data humans experience. Not only does socialised human intelligence require at least a decade of formal education, but it also spends a lot of time in a complex 3D environment which is literally hands-on. It's true some of the meta-structures predispose certain kinds of learning - starting with 3D object constancy, mapping, simple environmental prediction, and basic language abstraction. But that level gets you to advanced animal sentience. The rest needs a lot of training. For example - we can recognise objects in photographs, but I strongly suspect we learn 3D object recognition first - most likely with a combination of shape/texture/physics memory and modelling - and then add 2D object recognition later, almost as a form of abstraction. Human intelligence is tactile, physical, and 3D first, and abstracted later. So it seems strange to me to be trying to make AI start with abstractions and work backwards. |
Furthermore, for other kinds of human knowledge, the learning process is very rarely based on data. After the acquisition of language, we generally seem to learn much more by analogy and deduction than by purely analyzing data. The difference is evident, since we can often pick up facts with a single datapoint, even in small children in kindergarten.
Also, getting back to your point on how we start AI - if you try to take a neural network and throw 3D sensor data at it, and immediately start using its outputs to modify the environment those sensors are sensing, I suspect you will not get any meaningful amount of learning. You probably need a very complex model and set of initial weights to have any chance of learning something like 3D objects and their basic physics (weight, speed and hwo those affect their predicted position). I would at least bet that you wouldn't get anywhere near, say, kitten accuracy in one month of training.
Related to 3D objects vs 2D, I completely agree.