|
|
|
|
|
by p1esk
3410 days ago
|
|
Which existing neuromorphic computers achieve 10^14 ops/s at 20 W? If you compare them to GPUs, those "ops" better be FP32 or at least FP16. Also, you forgot to tell us what is that "extremely concrete reason why current neural net architectures will NOT work with the above optimizations". |
|
The comparison is of 3 bit neuromorphic synaptic ops against FP8 pascal ops. That factor is important (as it means that the neuromorphic ops are less useful), but it turns out to be dwarfed by the answer to your second question:
> Also, you forgot to tell us what is that "extremely concrete reason why current neural net architectures will NOT work with the above optimizations".
this is rather difficult to justify in this margin. But the idea is that proposals such as those above (50 Tops) tend to be optimistic on the efficiency of the raw compute ops. But these proposals really don't have much to say about the costs of communication (e.g. reading from memory, transmitting along wires, storing in registers, using buses, etc.). It turns out that if you don't have good ways to reduce these costs directly (and there are some, such as changing out registers for SRAMs, but nothing like the 100x speedup from analog computing), you have to just change the ratio of ops / bit*mm of communication per second. There are lots of easy ways to do that (e.g. just spin your ops over and over on the same data), but the real question is how to get useful intelligence out of your compute when it is data starved. This is an open question, and (sadly), very few ppl are working on it, compared to say low-bit-precision neural nets. But I predict this sentiment will be changed over the next few years.
Edit for below: no one is suggesting 50 Top/w hardware running alex net software to my knowledge (though would love to hear what they are proposing to run at that efficiency) . Nvidia among others are squeezing efficiency for cv applications with current software, but this comes at the cost of generality (it's unlike the communication tradeoffs they're making on that chip will make sense for generic AI research), and further improvements will rely on broader software changes, esp revolving around reduced communication. There are a lot of interesting ways to reduce communication without sacrificing performance, such as using smaller matrix sizes, which would reverse the state of the art trends.