| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gzer0 503 days ago

I spent time working with Andrej and the rest of the FSD team back in 2020/2021, and we had plenty of conversations on how human visual processing maps onto our neural network architectures. Our approach—transformer-based attention blocks, multi-scale feature extraction, and temporal fusion—mirrors elements of the biological visual cortex (retina → LGN → V1 → V2 → V4 → IT) which break down raw inputs and integrate them over time. It’s amazing how closely this synthetic perceptual pipeline parallels the way our own brains interpret the world.

The key insight we discovered was that explicitly enforcing brain-like topographic organization (as some academic work attempts - such as this one here) isn't necessary - what matters is having the right functional components that parallel biological visual processing. Our experience showed that the key elements of biological visual processing - like hierarchical feature extraction and temporal integration - emerge naturally when you build architectures that have to solve real visual tasks.

The brain's organization serves its function, not the other way around. This was validated by the real-world performance of our synthetic visual cortex in the Tesla FSD stack.

Link to the 2021 Tesla AI day talk: https://www.youtube.com/live/j0z4FweCy4M?t=3010s

3 comments

lukan 503 days ago

"It’s amazing how closely this synthetic perceptual pipeline parallels the way our own brains interpret the world."

It is amazing, that the synthetic pipeline, that was build to mimick the brain, seems to mimick the brain?

That sounds a bit tautological and otherwise I doubt we have really understood how our brain exactly interprets the world.

In general this is definitely interesting research, but worded like this, it smells a bit hyped to me.

link

Shorel 502 days ago

I interpreted it the other way around.

We can think of a solution space, with potentially many good solutions to the vision problem, and we can, in science fiction-like speculation, that the other solutions will be very different and surprise us.

Then this experiment shows its solution is the same we already knew, and that's it.

Then there aren't many good potential solutions, there is only one, and the ocean of possibilities becomes the pond of this solution.

link

trhway 503 days ago

The convolutional kernels in the first levels do converge to Gabors like the ones in V1 (and there were math works in the 90-ies, in neuro research, about optimality of such kernels) so it wouldn't be surprising if higher levels would converge to something that is similar to the higher levels of visual cortex (like hierarchical feature aggregation that is nicely illustrated by deep dreaming and also feels like it can be optimal under reasonable conditions and thus would be expected to emerge).

link

perching_aix 503 days ago

Did you read the part where he explicitly mentioned that they discovered how enforcing that architecture was not necessary, as it would emerge on its own?

link

lukan 502 days ago

I did, but it was not clear to me, how it was meant. I assume the basic design was done before (with the brain in mind).

link

iandanforth 503 days ago

Unlike neural networks the brain contains massive numbers of lateral connections. This, combined with topographical organization, allows it to do within layer temporal predictions as activations travel across the visual field, create active competition between similarly tuned neurons in a layer (forming natural sub networks), and quite a bit more. So, yeah, the brain's organisation serves it's function, and it does so very very well.

link

dmarchand90 503 days ago

I've found how CNN map to visual cortex to be very clear. But I've always been a bit confused about how llms map to the brain. Is that even the case?

link

nickpsecurity 502 days ago

They probably don’t. They’re very different. LLM’s seem to be based on pragmatic, mathematical techniques developed over time to produce patterns from data.

There’s at least three fields in this:

1. Machine learning using non-neurological techniques (most stuff). These use a combination of statistical algorithms stitched together with hyperparameter tweaking. Also, usually global optimization by heavy methods like backpropagation.

2. “Brain-inspired” or “biologically accurate”algorithms that try to imitate the brain. They sometimes include evidence their behavior matches experimental observations of brain behavior. Many of these use complex neurons, spiking nets, and/or local learning (Hebbian).

(Note: There is some work on hybrids such as integrating hippocampus-like memory or doing limited backpropagation on Hebbian-like architectures.)

3. Computational neuroscience which aims to make biologically-accurate models at various levels of granularity. Their goal is to understand brain function. A common reason is diagnosing and treating neurological disorders.

Making an LLM like the brain would require use of brain-inspired components, multiple systems specialized for certain tasks, memory integrated into all of them, and a brain-like model for reinforcement. Imitating God’s complex design is simply much more difficult than combining proven algorithms that work well enough. ;)

That said, I keep collecting work on both efficient ML and brain-inspired ML. I think some combination of the techniques might have high impact later. I think the lower, training costs of some brain-inspired methods, especially Hebbian learning, justify more experimentation by small teams with small, GPU budgets. Might find something cost-effective in that research. We need more of it on common platforms, too, like HughingFace libraries and cheap VM’s.

link

trhway 503 days ago

> how llms map to the brain

For the lower level - word embedings (word2vec, "King – Man + Woman = Queen") - one can see a similarity

https://www.nature.com/articles/d41586-019-00069-1 and https://gallantlab.org/viewer-huth-2016/

"The map reveals how language is spread throughout the cortex and across both hemispheres, showing groups of words clustered together by meaning."

link

nyrikki 502 days ago

That is the latent space.

Very different from a feed forward network with perceptrons, auttograd, etc...

Inner product spaces are fixed points, mapping between models is less surprising because the general case is a merger set IIRC.

link