Hacker News new | ask | show | jobs
by terminalhealth 2511 days ago
> recurrent projections would need to (1) form a 3rd party connection at every synapse and (2) know which synapses were to blame for the error

The most striking result in this regard is that one can get backprop with random backprojections. It works regardless because, on average, the error vector will be less than 90° from the true error vector (which is good enough for hillclimbing) and the dynamics play out in a way that the learned weights adjust to the random projections:

https://www.nature.com/articles/ncomms13276

That being said, it does seem to be the case that the brain simply memorizes an awful lot which must work by a different mechanism besides backprop because backprop cannot do one-shot learning. I think one-shot learning is how the brain gets past large discontinuities in the model fitness landscape: It can learn linguistic, logical rules, fragments of general computations and facts that are discovered by exploration (which includes learning about whether it was Bob or Mike) and passed on culturally. The brain basically outsources the problem of tunneling through large discontinuities, to cultural/individual trial-and-error and episodic memory. The greatest consequence is that these bodies of knowledge can concern the improvement of the organization of knowledge itself, resulting in a positive feedback loop in model fitness (especially science/Bayesian updating). Though such bodies of knowledge evolve respecting a learnability-by-hillclimbing soft constraint, which implies they often form a neat latent space where similar codes are organized to belong to similar meanings/representations, as this can easily be learned by stochastic hillclimbing (repetition) because each time the brain processes related information, it is nudged towards the latent space that is meant to be learned. Many parts of the world happen to be learnable in this way because everything is kinda smooth and continuous. Small causes tend to have small effects as everything consists of a myriad of small particles that affect each other in smooth ways if you squint at them. Though obviously not everything can be learned this way (implying large discontinuities) which is where a brute memorization based on reward and punishment comes in handy.

2 comments

> brain simply memorizes an awful lot which must work by a different mechanism besides backprop because backprop cannot do one-shot learning

If you look through a neuroscience textbook section on memory systems, it's commonly suggested that the hippocampus does the one shot learning and transfers that over time to the cortex. This is backed up by clinical case studies.

> The brain basically outsources the problem of tunneling through large discontinuities, to cultural/individual trial-and-error and episodic memory

That seems like a good strategy. It also reminds me of AlphaGo's Monte Carlo search + neural network training setup. Since the search is non differential, you do lots of simulations and apply a differentiable DL model to the results to approximate a possibly discontinuous landscape

> If you look through a neuroscience textbook section on memory systems, it's commonly suggested that the hippocampus does the one shot learning and transfers that over time to the cortex. This is backed up by clinical case studies.

HC's role in episodic memory and consolidation via dreams seems kinda plausible, though I would not put much weight on it. I think dreams are a way of training a GAN-like discrimination between reality and imagination:

http://gershmanlab.webfactional.com/pubs/GenerativeAdversari...

Repetition of any kind likely does improve the model, even if it's merely simulation/dreaming.

> AlphaGo's Monte Carlo search + neural network

I think, in effect, MCTS amounts to something like bagging/boosting/mixture of experts, as it computes a weighted average of the predictions when exploring different branches. But sure, the search mechanism implements a function which a recurrent neural network could probably not discover as it hides behind substantial discontinuities in fitness landscape (it's not a structure which you can uncover step by step, but you immediately need tree structure, a search recursion etc.). The RNN would likely need to conceptualize the search process (subvocally but) linguistically like humans do, which requires structure for the sequential composition of stable prototypes (symbols) which likely requires a one-shot sequential memory. I think even the human mind does not literally do MCTS (would require an overhead that the brain is just not capable of), but some heuristic approximation thereof. The brain can simulate MCTS by linguistic means, though, even if it's just words of wisdom like "take counsel with your pillow", which literally means explore the hypothesis space some more and let the temporal differences backup better value estimates.

Very interesting also is that you can directly send the error to intermediate layers through sparse random projections without the need for any layerwise backpropagation. This relaxation of the structure of the backward pass makes bp even more plausible from a biological perspective.

https://arxiv.org/abs/1609.01596 https://arxiv.org/abs/1903.02083