Hacker News new | ask | show | jobs
by joshvm 3252 days ago
Ignoring the price tag this is about half the performance of the Jetson TX2 which can manage around 1.5TFLOPS on 7.5W.

Interesting that you could use this to accelerate systems like the Raspberry Pi. The Jetson is a pain in the backside to deploy (at a production level) because you need to make your own breakout board, or buy an overpriced carrier.

EDIT: I use the Pi as an example because it's readily available and cheap. There are lots of other embedded platforms, but the Pi wins on ecosystem.

2 comments

1.5TFLOPS would have made the supercomputer top500 12 years ago. That's amazing.
Keep in mind that supercomputers are a lot less specialized than circuits for running neural nets.

12 years ago you could have gotten a stack of 5-8 7800 GTX cards and had 1.5TFLOPS of single precision. 11 years ago you could have had a stack of 5 cards with unified shaders. It's not fair to compare against the significantly more complicated route of getting 100 CPU cores working together with only 1-4 per chip.

But can't you configure the device to do e.g. fast matrix-vector multiplications instead of inference? I can be wrong, but I suspect that's what people do mostly on supercomputers anyway.
That 1.5 TFLOPs for TX2 is FP16, while TOP500 is FP64.
But, you can do training on a Jetson, whereas the stick is inference only of pre-trained networks
You can't really do any reasonable training on a Jetson.
Thanks, worth knowing (was thinking of getting one in a few months)