Hacker News new | ask | show | jobs
by Russell91 3413 days ago
> Yes, I totally agree. Yann LeCun, Geoff Hinton, Jurgen Schmidhuber and others did unpopular work for a long time.

...

> Until then, I'll be ... rolling my eyes at brain analogies.

Maybe you don't realize this, but these guys made more brain analogies than you can count over the same period to which you attribute their greatness. Meanwhile, they were attacked year after year by state-of-the-art land grabbers saying the same things you just did.

> isn't being presented as basic research on a risky hypothesis.

It is basic research, but it's not a risky hypothesis. Existing neuromorphic computers achieve 10^14 ops/s at 20 W. Thats 5 Tops/Watt. The best GPUs currently achieve less than 200 Gops/Watt. Where is the risk in saying that a man-made neuromorphic chip can achieve more per dollar than a GPU. There is no risk, and suggesting that this field is somehow has too much risk for advances to be celebrated is absolutely crazy.

1 comments

Non-neuromorphic (analog) deep learning chip startup here. We're forecasting AT LEAST ~50 TOPS/watt for inference.
Sure - I guess it's productive for me to answer why this doesn't disagree with my comment. By the time you get the software to hook up that kind of low bit precision (READ: neuromorphic) compute performance with extreme communication-minimizing strategies (READ: neuromorphic), which will invariable require compute colocated, persistent storage (READ: neuromorphic) in any type of general AI application, you're not exactly making the argument that neuromorphic chips are a bad idea.

We literally have to start taking neuromorphic to mean some silly semantics like "exactly like the brain in every possible way" in order to disagree with it.

Edit: also, to ground this discussion, there are extremely concrete reason why current neural net architectures will NOT work with the above optimizations. That's the primary motivation for talking about "neuromorphic", or any other synonym you want to coin, as fundamentally different hardware. AI software ppl need to have a term for hardware of the future, which simply won't be capable of running AlexNet well at all, in the same way that a GPU can't run CPU code well. I think the term "neuromorphic" to describe this hardware is as productive as any.

Which existing neuromorphic computers achieve 10^14 ops/s at 20 W? If you compare them to GPUs, those "ops" better be FP32 or at least FP16.

Also, you forgot to tell us what is that "extremely concrete reason why current neural net architectures will NOT work with the above optimizations".

>Which existing neuromorphic computers achieve 10^14 ops/s at 20 W? If you compare them to GPUs, those "ops" better be FP32 or at least FP16.

The comparison is of 3 bit neuromorphic synaptic ops against FP8 pascal ops. That factor is important (as it means that the neuromorphic ops are less useful), but it turns out to be dwarfed by the answer to your second question:

> Also, you forgot to tell us what is that "extremely concrete reason why current neural net architectures will NOT work with the above optimizations".

this is rather difficult to justify in this margin. But the idea is that proposals such as those above (50 Tops) tend to be optimistic on the efficiency of the raw compute ops. But these proposals really don't have much to say about the costs of communication (e.g. reading from memory, transmitting along wires, storing in registers, using buses, etc.). It turns out that if you don't have good ways to reduce these costs directly (and there are some, such as changing out registers for SRAMs, but nothing like the 100x speedup from analog computing), you have to just change the ratio of ops / bit*mm of communication per second. There are lots of easy ways to do that (e.g. just spin your ops over and over on the same data), but the real question is how to get useful intelligence out of your compute when it is data starved. This is an open question, and (sadly), very few ppl are working on it, compared to say low-bit-precision neural nets. But I predict this sentiment will be changed over the next few years.

Edit for below: no one is suggesting 50 Top/w hardware running alex net software to my knowledge (though would love to hear what they are proposing to run at that efficiency) . Nvidia among others are squeezing efficiency for cv applications with current software, but this comes at the cost of generality (it's unlike the communication tradeoffs they're making on that chip will make sense for generic AI research), and further improvements will rely on broader software changes, esp revolving around reduced communication. There are a lot of interesting ways to reduce communication without sacrificing performance, such as using smaller matrix sizes, which would reverse the state of the art trends.

Regarding your first answer, sounds like you're doing apples to oranges comparison here. What are those "synaptic ops"? Xavier board is announced to be capable of 30 Tops (INT8) at 30W, so even if your neuromorphic chip does 100 Tops at 20W, assuming for a second those ops are equivalent to INT3 operations, this makes them very similar in efficiency.

And you still haven't answered my second question: what is the reason the future neuromorphic chips won't be able to run current neural net architectures?

I'm not even sure what you are talking about at the end of your comment. The 50Tops/W figure was promised for an analog chip, designed to run modern DL algorithms. Sounds pretty reasonable, and I don't see how your arguments apply to it. Are you saying we can't build an analog chip for DL? Why does it have to be data starved?

Our hardware can run AlexNet...
In an integrated system at 50 tops/watt? How are you going to even access memory at less than 20 fJ per op? Like, you're specifically trying to hide the catch here. If we were to take you at face value, we'd have to also believe that Nvidia is working on an energy optimized system that is 50x worse for no good reason.

For reference, reading 1 bit from a very small 1.5kbit sram, which is much cheaper than the register caches in a gpu, costs more than 25 fJ per bit you read.