Hacker News new | ask | show | jobs
by jegp 1149 days ago
I'm a PhD student working with neuromorphic computing. I like to think about SNNs as RNNs with discretized outputs. The neurons themselves may have some complicated nonlinear dynamic (currents integrating into the membrane voltage somehow etc.) but they are essentially just stateful transfer functions. The notion of spikes is a crippling simplification, but it's power efficient and you can argue for numerical stability in the limit. So I tend to consider spikes as an annoying engineering constraint in some neuromorphic systems. Brains function perfectly well without them, although in smaller scales (C. elegans).

The true genius of neuromorphics in my view, is that you can build analog components that performs neutron integration for free. Imagine a small circuit that "acts" like the stateful transfer function, with physical counterparts to the state variables (membrane voltage, synaptic current, etc.). In such a circuit you don't need transistors to inefficiently approximate your function. Physics is doing the computation for you! This gives you a ludicrous advantage over current neural net accelerators. Specifically 3-5 orders of magnitude in energy and time, as demonstrated in the BranScaleS system https://www.humanbrainproject.eu/en/science-development/focu...

Unfortunately, that doesn't solve the problem of learning. Just because you can build efficient neuromorphic systems doesn't mean that we know how to train them. Briefly put, the problem is that a physical system has physical constraints. You can't just read the global state in NWN and use gradient descent as we would in deep learning. Rather, we have to somehow use local signals to approximate local behaviour that's helpful on a global scale. That's why they use Hebbian learning in the paper (what fires together, wires together), but it's tricky to get right and I haven't personally seen examples that scale to systems/problems of "interesting" sizes. This is basically the frontier of the field: we need local, but generalizable, learning rules that are stable across time and compose freely into higher-order systems.

Regarding educational material, I'm afraid I haven't seen great entries for learning about SNNs in full generality. I co-author a simulator (https://github.com/norse/norse/) based on PyTorch with a few notebook tutorials (https://github.com/norse/notebooks) that may be helpful.

I'm actually working on some open resources/course material for neuromorphic computing. So if you have any wishes/ideas, please do reach out. Like, what would a newcomer be looking for specifically?

1 comments

I don't know anything about SNNs, so I think I'm who you're asking. Something I'm really interested in is if there's any possibility of transferring training from normal NNs to the sort of physically embodied things you're discussing. Like how RWKV trains its weights like it's a transformer but then acts on them like it's an RNN, would it be possible to do training with the sort of large deployed NNs that are all the rage right now, but then somehow instantiate those weights into the hardware you're discussing? I'm guessing it's non-viable as-is because of the discrete nature of SNN function, and rounding up or down probably doesn't work, but I would be interested in anything you have to say on it.

Also, I read years ago about a project that was similarly instantiating NNs physically, but it was using optical properties of layered plates to perform the equivalent of weights, do you know anything about that? I don't think it was discrete (can't see why it would be operating on light), but I'd be interested in anything you have to say about that too.

If we think about spikes as discretized transfer functions, I would say it's totally viable to "hack" them to represent numerical approximations. In fact, a recent paper demonstrates that exact mappings between ANNs and SNNs exist: https://arxiv.org/abs/2212.12522 There are some pitfalls here, though, and I'm biased against these kinds of methods because they don't use the temporal traces of the neuron integration.

Regarding RWKV, someone actually trained a "SpikeGPT": https://arxiv.org/abs/2302.13939 That's a neat insight, which will be great for porting these models onto energy-efficient devices. But the learning problem is still the most interesting open question to me. If we crack that, we can scale down GPT-like models by several orders of magnitude since we can "re-learn" subproblems instead of "hardcode" a silly number of permutations, like the present models do. Neuromorphic hardware (brains included) lend themselves incredibly well to learning. We just don't know how to exploit that yet.

Regarding the optical layers, are you referring to optical chips like this one https://www.nature.com/articles/s41467-020-20719-7 ? That would be an example of using optics to implement your stateful transfer functions (https://en.wikipedia.org/wiki/Optical_neural_network), but there are several of other incredibly promising technologies such as memristors (https://en.wikipedia.org/wiki/Memristor), quantum materials (https://arxiv.org/abs/2204.01832) and even biologically based chips (https://en.wikipedia.org/wiki/Wetware_computer). My take on this is that these technologies exploit different principles of physics to "compute" in some way. But I like to think that our computational theories and principles are independent of the implementation substrates.

There's still a long way to go, but practically speaking, I'm convinced this kind of hardware will have profound consequences for the way that we compute today. We're talking at least 3 orders of magnitude in compute. Imagine ChatGPT running 1000 times as fast. It's ridiculous.

Yes that Nature paper was exactly it, and thank you for the reference on ANN:SNN mapping. I'm not 100% clear on what you mean by the learning problem. Do you mean like trying to run training on these physical neural nets rather than just inference? Or just trying to train an SNN conventionally, is that difficult for some reason?

And you are so right that it would be ridiculous. It's funny, before I read your comment this morning I was watching a livestream of a comedian talking to an AI-generated character ("Slunt"). I don't know the implementation details, but it would have been something simple like an open source or commercial speech to text program, maybe even Whisper, then the text passed through to the OpenAI API (probably GPT-4) with a prompt wrapper to set up the character and setting, then the response received and generated with ElevenLabs or something like that. I can't link it as it was a livestream, but it was the same sort of thing that was used to make this demo: https://youtu.be/u_Zn89_g7ok. Anyway, the whole time I was watching her talk to this AI character, it was taking quite a while to respond, taking a while to recognise her voice, etc etc, and I was thinking about what would be required for that interface to be truly conversational. It's just speed. If it was running even ten times faster, that would be closer, but a hundred times faster is probably what would be required to have a genuinely conversational interface. What you need is for your voice to be recognised and converted to text basically instantly, then have the LLM go over it and respond basically instantly, then have the TTS program start saying it basically instantly as well - and have the software wrapper ready to hear you if you interrupt it and respond to that interruption appropriately, or to interrupt you if it has something to add. That's what would be required for it to truly be natural conversation, because that's the only way you can interrupt it or be interrupted by it in the way that humans do when talking to each other, with the sort of responsiveness that makes it fluid rather than like an intercontinental phone call. I don't think we're going to get that sort of performance improvement any time soon by just continuing to scale regular silicon or ASICs. We need new, specific hardware. And I know a lot of people might think of the conversational ease of use as not really important, not compared to the capabilities. There's an element of truth to that. But here's the thing: ChatGPT was primarily a UX/UI invention, not a technological one, and ChatGPT is what has driven this insane amount of interest, new use cases, and hype. GPT-3 was nearly as powerful, it was just much more clunky and with various other factors that meant you couldn't just go and use it. Making it easier to use was what made it so much more valuable to people that they actually wanted to use it for their problems. And it will go a long way beyond just making it conversational, too. The 2020s are going to be an absurd decade and we're not even halfway through.