|
|
|
|
|
by gzer0
503 days ago
|
|
I spent time working with Andrej and the rest of the FSD team back in 2020/2021, and we had plenty of conversations on how human visual processing maps onto our neural network architectures.
Our approach—transformer-based attention blocks, multi-scale feature extraction, and temporal fusion—mirrors elements of the biological visual cortex (retina → LGN → V1 → V2 → V4 → IT) which break down raw inputs and integrate them over time. It’s amazing how closely this synthetic perceptual pipeline parallels the way our own brains interpret the world. The key insight we discovered was that explicitly enforcing brain-like topographic organization (as some academic work attempts - such as this one here) isn't necessary - what matters is having the right functional components that parallel biological visual processing. Our experience showed that the key elements of biological visual processing - like hierarchical feature extraction and temporal integration - emerge naturally when you build architectures that have to solve real visual tasks. The brain's organization serves its function, not the other way around. This was validated by the real-world performance of our synthetic visual cortex in the Tesla FSD stack. Link to the 2021 Tesla AI day talk: https://www.youtube.com/live/j0z4FweCy4M?t=3010s |
|
It is amazing, that the synthetic pipeline, that was build to mimick the brain, seems to mimick the brain?
That sounds a bit tautological and otherwise I doubt we have really understood how our brain exactly interprets the world.
In general this is definitely interesting research, but worded like this, it smells a bit hyped to me.