Hacker News new | ask | show | jobs
by timlarshanson 1955 days ago
But, if your realistically-spiking, stateful, noisy biological neural network is non-differentiable (which, so far as I know, is true), then how are you going to propagate gradients back through it to update your ANN approximated learning rule?

I suspect that given the small size of synapses the algorithmic complexity of learning rules (and there are several) is small. Hence, you can productively use evolutionary or genetic algorithms to perform this search/optimization. Which I think you'd have to due to the lack of gradients, or simply due to computational cost. Plenty of research going on in this field. (Heck, while you're at it, might as well perform similar search over wiring typologies & recapitulate our own evolution without having to deal with signaling cascades, transport of mRNA & protein along dendrites, metabolic limits, etc)

Anyway, coming from a biological perspective: evolution is still more general than backprop, even if in some domains it's slower.

1 comments

This is a good question. I think many "biologically plausible" neural models are willing to make some approximations for the benefit of computational power (e.g. rate coding instead of spike coding, point neurons and synapses instead of a cable model). As for non-differentiable operations, I think one strategy might be to formulate it as a multi-agent communication problem (e.g. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewFil...), where gradients are obtained via a differentiable relaxation or using a score-function gradient estimator (e.g. REINFORCE)
You can actually calculate exact gradients for spiking neurons using the adjoint method: https://arxiv.org/abs/2009.08378 (I'm the second author). In my PhD thesis I show how this can be extended to larger problems and more complicated and biologically plausible neuron models. I agree with the gist of your post though: Retrofitting back propagation (or the adjoint method for that matter) is the wrong approach. One should rather use these methods to optimise biologically plausible learning rules. The group of Wolfgang Maass has done exciting work in that direction (e.g. https://arxiv.org/abs/1803.09574, https://www.frontiersin.org/articles/10.3389/fnins.2019.0048..., https://igi-web.tugraz.at/PDF/256.pdf).
I was aware of Neftci's work, but not your result -- I stand corrected! Given the perspective, given LIF networks are causal systems, of course you can reverse it with sufficient memory. I understand the memory in this case are input synaptic currents at the time of every spike (e.g. what synapses contributed to the spike). This is suspiciously similar to spine and dendritic calcium concentrations. Those variables are usually only stored for a short time - but that said the hippocampus (at least) is adept at reverse replay so there is no reason calcium could not be a proxy for 'adjoint'. hum.

Interesting Maass references too. Cheers

I agree that calcium seems like a natural candidate and I suggest as much in my thesis. Coming from physics, I didn't know about reverse replay in the hippocampus for a long time, but I also have this association now. I would be glad to talk more, is there a way to reach you?