|
|
|
|
|
by lisper
2777 days ago
|
|
The problem is not that we're not willing to tolerate latency. The problem is that the model of how neural networks are trained is completely different from how humans learn to see. When a neural net is trained, it is shown a static image, weights are tweaked until the output is correct, and then it is shown a completely different static image and the process is repeated. Neural net learning is iterative, discrete, and supervised. Human visual learning, by contrast, is continuous and largely unsupervised. We don't see snapshots, we see continuously varying images. Furthermore, we actively interact with the world by manipulating objects and shifting our gaze, and that information is also incorporated into our visual learning. Finally, humans have very advanced feature detectors built in to our brains by evolution. We don't learn to see cats, we have cat-detectors built in to our brains by our DNA, which learned to detect cats because that was a useful skill in our ancestral environment, when cats were a lot bigger and could eat us. We do learn that the thing that our cat-detectors detect is called a "cat", but we don't "learn" what a cat (or a human) looks like. That's built in to our brain wiring. (There are some things that we do learn what they look like, like cars, which obviously didn't exist in the ancestral environment. That's why all humans can tell the difference between a cat and a dog, but not everyone can identify whether something is a Honda or a Toyota.) The point is: the process that humans go through when they learn is completely different than the process that contemporary neural nets go through. No one has yet come up with a theory that combines all of the features of human learning into an implementable algorithm. It will surely happen eventually, but there are at least a few more conceptual breakthroughs that will need to happen. Minor tweaks to back-propagation won't do it. |
|