Hacker News new | ask | show | jobs
by howlgarnish 2069 days ago
Coral is powered by an Edge TPU (Tensor Processing Unit), which wipes the floor with GPUs like the Jetson Nano when it comes to running Tensorflow:

https://blog.usejournal.com/google-coral-edge-tpu-vs-nvidia-...

...and Google is pretty invested in TPUs, since it uses lots of them in house.

https://en.wikipedia.org/wiki/Tensor_Processing_Unit

5 comments

They might be great for inference with tensorflow - but from what I can tell from Google's documentation, Coral doesn't support training at all.

I'm sure an ML accelerator that doesn't support training will be great for applications like mass-produced self-driving cars. But for hobbyists - the kind of people who care about the difference between a $170 dev board and a $100 dev board - being unable to train is a pretty glaring omission.

You wouldn't want to use it for training: This chip can do 4 INT8 TOPs with 2 watts. A Tesla T4 can do 130 INT8 TOPs with 70 watts, and 8.1 FP32 TFLOPs.

Assuming that ratio holds, you'd maybe get 231 GFLOPs for training. The Nvidia GTX 9800 that I bought in 2008 gets 432 GFLOPs according to a quick Google search.

Hobbyists don't care about power efficiency for training, so buy any GPU made in the last 12 years instead, train on your desktop, and transfer the trained model to the board.

On the other hand, it would be useful for people experimenting with low-compute online learning. Also, those types of projects tend to have novel architectures that benefit from the generality of a GPU.
Last I’ve heard covid was making GPUs about as difficult to find as the other things it’s jacked the prices up on, too.
You can get pretty much any GPU at pre-COVID prices right now, except for the newest generation NVIDIA GPUs that just came out to higher-than-expected demand.
As a hobbyist in a state with relatively high electricity prices, I do care about the power efficiency of training.
Training is what the cloud is for.
That makes a $170 board that can also do training look dirt cheap in comparison
Good luck training anything in any reasonable time on it.
Useful for adapting existing models. Not everything needs millions of hours of input.
If you want to train yet-another-convnet sure, but there could be applications where you want to train directly on a robot with live data, as in interactive learning.

See this paper for an example of interactive RL: https://arxiv.org/abs/1807.00412

or a highly rigged machine, this looks more for fast real time ML inference on the edge
You can adapt the final layer of weights on edge tpu.

Training on a dev board should be a last resort.

Even hobbyists can afford to rent gpus for training on vast.ai or emrys

Google is pretty invested in TPUs for their own workloads but I fail to see any durable encouragement of them as an external product. At best they're there to encourage standalone development of applications/frameworks to be deployed on Google Cloud (IMHO of course).
AFAIK, apart from toy dev boards like this, you can't buy a TPU, you can only rent access to them in the cloud. I wouldn't want my company to rely on that. What if Google decides to lock you out? If you've adapted your workload to rely on TPUs, you'd be fucked.
What's the difference between Coral's production line of Edge TPU modules and chips [1] and Google's cloud TPU offering?

Note: I haven't tried sourcing these in production (100k+) quantities so I have no idea what guarantees that product line gives customers.

[1] https://coral.ai/products/#production-products

They're nothing alike at all. Similar to how a low end laptop GPU differs from a top of the line NVIDIA datacenter offering. Google's cloud TPU offering is the strongest ML training hardware that exists, the edge devices simply support the same API.
Edge tpu is 2 tflops at half precision, cloud tpu starts at 140 tflops single precision and scales further.

Also edge tpu is 2-5Watts. Supposedly cloud tpus are more power efficient than GPUs, and for eg the 14 tflops 2080 ran at 300 W regularly.

Coral can only run inference, and is optimized for models using 8-bit integers (via quantization).

A full TPU v2/v3 can train models and use 16/32 bit floats. They also have a Google-specific (?) 16-bit floating point type with reduced precision.

And don't forget, TPUs are horrible at floating point math! The errors!
Yea I've been wondering about charts I've seen comparing tpu model quality perf to gpu model quality like here [1], whether that could be due to error correction. At the same time training on gaming gpus like 1080 ti or 2080 ti is widely popular, though they lack the ECC memory of the "professional" quadro cards or V100. I did think conventional DL wisdom said "precision doesn't matter" and "small errors don't matter" though.

I've noticed this difference in quality perf in my own experiments tpu vs gaming gpu, but don't know for sure what the cause is. I never did notice a difference between gaming gpu trained models and quadro trained modela. Have more info/links?

1: https://github.com/tensorflow/gan/tree/master/tensorflow_gan...

Until you want to use Pytorch or another non tensor flow framework the support goes down dramatically. Jetson Nano supports more frameworks out of the box quite well, and it ends up being same cuda code you run on your big Nvidia cloud servers
Not only that, nvidia cares deeply about pytorch. Visit pytorch forums and look at most upvoted answers. All by nvidia field engineers.
That benchmark appears to compare full precision fp32 inference on the nano with uint8 inference on the coral, that floor wiping comes with a lot of caveats
There seems to be more than one jetson board.