Hacker News new | ask | show | jobs
by zopf 2927 days ago
I still think the defining moment for ML inference (and maybe even training!) on embedded devices will come when there are viable special-purpose, low-power ML chips.

As much as I hate to do this, I'm going to make a comparison to Bitcoin mining.

Mining is all about optimizing hashes/joule to get the best ROI. We watched it go from CPU -> GPU -> FPGA -> ASIC in the quest for efficiency.

In ways, we're seeing the same thing in ML model training and inference. CPU -> GPU -> TPU. We're even seeing some special-purpose coprocessors deployed, as in the iPhone X. (https://www.wired.com/story/apples-neural-engine-infuses-the...)

But I think the final leap will come by going from digital execution to application-specific analog computing. If you don't need high precision, you can compute extremely quickly and efficiently using properly-configured analog circuits.

IBM is working on this kind of system with their TrueNorth line (https://techcrunch.com/2017/06/23/truenorth/)

It hasn't been proven yet, but I think there is huge potential.

2 comments

I remain unconvinced we'll see ASICs dominating inference. Part of the problem is that even if we're just talking about neural networks, there's a variety of architectures, activation functions, etc. to consider. At this stage, from my own benchmarking Nvidia is close enough to the TPU with the V100 card while allowing much more flexibility in the software stack used.

For inference, GPUs are also pretty damn efficient since it's an embarrassingly parallel task w/ minimal synchronization (no gradient updates needed). In this case, FPGAs are a far better choice since you can push updates to accommodate new network architectures, activation functions, ,etc. The TPU instead relies on a matrix-multiplier unit which supports more use cases but won't be as performant on something like an RNN.

I think Microsoft's experience with FPGAs for inference would agree with you.

Currently, they are only allowing external customers use ResNet-50 with their FPGA-enabled Azure ML.

TrueNorth is 100% digital.
After some investigation, you are correct! Knowing that some of TrueNorth's creators previously worked on mixed-mode systems, I made the assumption that this one was too.

It seems the TrueNorth is indeed fully digital, but takes advantage of the event-driven architecture and peer-to-peer communication between many tiny cores to keep things low-power.

( http://paulmerolla.com/merolla_main_som.pdf for some details )

Thank you for the correction!