Could you elaborate on why you think low energy deep learning was a misguided promise for SNNs? Just came across them for the first time last week and the low energy promise seemed like their most interesting aspect!
Deep learning is fundamentally linear algebra. Spiking networks are fundamentally event-based processors. The two concepts don’t play well together.
Many researchers have been trying hard to shoe-horn deep ANNs into spiking networks for the last 10 years. But this doesn’t change the fact that linear algebra is best accelerated by linear algebra accelerators (i.e. GPUs/TPUs).
Generally, spiking networks will likely have an edge when the signals they are processing are events in time. For example, when processing signal streams from event based sensors, like silicon retinas. There’s also evidence that event-based control has advantages over their periodically-sampling equivalents.
If you bring activation sparsity into the mix, the advantage of SNN processors over GPUs/TPUs becomes more clear. Loss-gradient-based optimisation approaches are great because they give you a tool to include e.g. sparsity regularisation into the loss. Encouraging sparse activity makes simple linear algebra a poor fit for network activation, and SNN processors a much better fit.
But is sparse activation sufficient to motivate the use of SNNs? In my opinion one needs a temporal component as well.
Sparse activations that don't also have a time component (i.e. are sparse in space and time) can be very well implemented without events.
Granted, SNN processors can handle sparse activations better than matrix accelerators. But then again, SNN accelerators might carry lots of SNN overhead that is not required for sparse activations alone.
Edit: A good example for a non-spiking sparse activation accelerator is the NullHop architecture [1].
I agree. The use case needs to justify having state, otherwise the ideal architecture is something like NullHop. Temporal signal processing / vision processing tasks are ideal for SNNs, especially if the inputs can also be sparse.
I agree with these points, however the main advantage of the method presented in the paper is precisely that both the forward propagation and backward propagation can be seen as being performed by a network operating on temporally sparse events. We absolutely had event-based sensors and control as a motivation in mind. The fact that you can write down the connectivity of the neurons in terms of a weight matrix, does not mean that it can't be sparse. Since you are actually processing one spike at a time (potentially asynchronously), you don't need to implement any matrix multiplication. Current neuromorphic hardware achieves at least some degree of sparsity in their synaptic crossbars (BrainScales2, Spinnaker) or largely eliminates them like Loihi.
Ultra-low-power neuromorphic processors such as DynapSE[1] have been cross-bar free for several years now, making them a perfect fit for sparse networks (both weight- and activity-sparsity).
[1] https://arxiv.org/abs/1708.04198
Yes, the algorithm you proposed is impressive and has the potential to become a game-changer.
However, I think the MNIST and the Ying/Yang dataset, using latency-coding, are not the ideal example to demonstrate its performance.
These datasets are useful to demonstrate nonlinear classification, and it's certainly great to see that the spiking network performs competitively. However, the transformation into a latency code costs time, in terms of computation, and also in terms of representation, before even one item is classified. Perceptron-based ANNs with continuous outputs don't require this step and will always have an edge over spiking networks in such scenarios.
I think what the field is really lacking is an ML problem that can leverage spiking networks directly, that does not require costly conversion of data into a representation that is suitable for spiking networks.
I agree that the choice of task is not ideal. It is something that I struggled quite a bit with, since coming up with a good task can be a lot of work. Unfortunately even some of the "neuromorphic" datasets that are in use can be solved by massive temporal averaging or result in reduced performance of the network relative to "analog" temporal input (e.g. on Google Speech Commands). I'm collaborating with a group that is interested in event-based vision and control, so hopefully this will result in more practical/impressive demonstrations in the future.
I have always wondered if results against the MNIST digits are generic. One might think it would work if you put in some other digits such as 一, 二, 三, 四 would they cluster the same with tSNE?
Many researchers have been trying hard to shoe-horn deep ANNs into spiking networks for the last 10 years. But this doesn’t change the fact that linear algebra is best accelerated by linear algebra accelerators (i.e. GPUs/TPUs).
Generally, spiking networks will likely have an edge when the signals they are processing are events in time. For example, when processing signal streams from event based sensors, like silicon retinas. There’s also evidence that event-based control has advantages over their periodically-sampling equivalents.