Hacker News new | ask | show | jobs
by dkfellows 2779 days ago
The next generation will have single precision hardware floats, but that's still at the prototype stage (with little bits of the processor running on a monster FPGA in the lab).

The key however is that SpiNNaker is a MIMD system (the cores are really independent of each other, except for a shared clock and chip-level shared co-packaged SDRAM) with a very fancy fast multicast interconnect that's been tuned for handling small source-routed packets without guaranteed delivery (but with guaranteed detection of failure to deliver). It's the almost complete antithesis of MPI, and it is by using that well that we get great performance in neural simulation. (I'm a software developer on the team.)

2 comments

I haven't had a chance to go back and read the literature or talk to people more deeply, but what I've heard about SpiNNaker recently in conversation and semi-technical talks has been confusing when it comes to comparisons. The distinguishing features as presented are things I expect of large HPC systems.

I don't mean SpiNNaker isn't interesting, and I've been pointing it out as such for years but it's been basically unknown even relatively locally.

It's basically very different in approach to many modern computers. The cores are slow and low-powered, but the interconnect is very fast for routing small packets to multiple destinations, which means that computational tasks that would otherwise be utterly dominated by communication costs (e.g., neural simulations) become a lot more tractable.

But since it's all done in soft realtime with very low level code (and no hardware floats in the current hardware generation) and not much of an OS, it's a very unusual platform for people to work with. Much more like programming used to be like in the 1980s, if my memory serves me right. (One of the key distinguishing things about SpiNNaker in the field of neuromorphic systems is that actually has an OS at all. Most competitor systems are purely bare metal, as they're put together by deep hardware hackers without consulting software engineers.)

Is this interconnect IP like or something else entirely? What’s the story on time delays? Why source routing?
That's an excellent question!

It's not at all like IP. The basic message size is (IIRC) 64 or 96 bits, comprising a system control word, an application header word, and an optional payload word. The application header word describes what the identity of the sender of the message is (well, in theory it could describe the destination too, but then we'd not have enough space to address much at all) and is used in the routing of the messages. Each chip has a very fast masked CAM (the key IP of SpiNNaker) that is used to convert from the application header word to the destinations to deliver that packet to, which is one channel to each core on the chip and one channel to each direction in the logical triangular mesh in which the chips are connected. The router is very fast indeed, and very low power, so we can generally count on routing a packet right to the opposite side of the machine in a few milliseconds, and I'd have to look up the energy cost of a packet (we've published it, but I forget where). I believe our route planning software takes this delay into account. It also tries to put neurons that communicate with each other close together.

For greater delays than that, we also have a delay slot system (for up to 16 simulation timesteps, which is approximately 16ms) in our synapse model, and specialized pseudo-neurons that implement longer delays than that on cores that we set aside for the purpose (and which, because they only handle delays, are much easier to make scale).

We do source routing mainly because this was hardware designed from the beginning to do neural simulation; source routing is a natural way to implement (an abstraction of) axons, as each axon is capable of connecting to many different dendrites. This is very much an abstraction of what happens in reality, but it has worked well for us. Also yes, our routing algorithms most definitely do try to limit the amount of traffic going down each communication link. Since communication during execution is pretty predictable (at least statistically) this is far more practical than with IP, where the dominating factors relate far more to being able to manage the network without knowing its total state.

Wow thanks for the great response. It's really interesting to read about (computer) network fundamentals rethought for neural systems.

> specialized pseudo-neurons that implement longer delays than that on cores that we set aside for the purpose (and which, because they only handle delays, are much easier to make scale)

I'm curious to hear more about that, as I don't recall hearing that previously. I'm a dev on the Virtual Brain, another simulator starting to be used in HBP (CDP8), for which we derive tract length info from human diffusion imaging and use it to introduce time delays. These can be up to 256 ms. On the other hand, we're usually running a few hundred neural masses (or some specialized datasets go up to 515k nodes). Are those numbers feasible with your delay-neurons?