Hacker News new | ask | show | jobs
by legolassexyman 3252 days ago
> Movidius's NCS is powered by their Myriad 2 vision processing unit (VPU), and, according to the company, can reach over 100 GFLOPs of performance within an nominal 1W of power consumption. Under the hood, the Movidius NCS works by translating a standard, trained Caffe-based convolutional neural network (CNN) into an embedded neural network that then runs on the VPU.

This is sure to save me money on my power bill after marathon sessions of "Not Hotdog."

1 comments

Ignoring the price tag this is about half the performance of the Jetson TX2 which can manage around 1.5TFLOPS on 7.5W.

Interesting that you could use this to accelerate systems like the Raspberry Pi. The Jetson is a pain in the backside to deploy (at a production level) because you need to make your own breakout board, or buy an overpriced carrier.

EDIT: I use the Pi as an example because it's readily available and cheap. There are lots of other embedded platforms, but the Pi wins on ecosystem.

1.5TFLOPS would have made the supercomputer top500 12 years ago. That's amazing.
Keep in mind that supercomputers are a lot less specialized than circuits for running neural nets.

12 years ago you could have gotten a stack of 5-8 7800 GTX cards and had 1.5TFLOPS of single precision. 11 years ago you could have had a stack of 5 cards with unified shaders. It's not fair to compare against the significantly more complicated route of getting 100 CPU cores working together with only 1-4 per chip.

But can't you configure the device to do e.g. fast matrix-vector multiplications instead of inference? I can be wrong, but I suspect that's what people do mostly on supercomputers anyway.
That 1.5 TFLOPs for TX2 is FP16, while TOP500 is FP64.
But, you can do training on a Jetson, whereas the stick is inference only of pre-trained networks
You can't really do any reasonable training on a Jetson.
Thanks, worth knowing (was thinking of getting one in a few months)