Hacker News new | ask | show | jobs
by Nokinside 3036 days ago
Specialization brings speedups.

TPUv2 is specially optimized for deep learning.

Nvidia's Volta microarchitecture is graphics processor with additional tensor units. It's a General-purpose (GPGPU) chip designed with graphics and other scientific computing tasks in mind. Nvidia has enjoyed monopoly power in the market and single microarchitecture has been enough in every high performance category.

Next logical step for Nvidia is to develop specialized deep learning TPU to compete with TPUv2 and others.

2 comments

> Next logical step for Nvidia is to develop specialized deep learning TPU to compete with TPUv2 and others

I don't know, this benchmark seems to show V100 doing pretty well against a specialized ASIC. It may well be that all NVIDIA has to do is cut costs on V100 to make a two V100s about as expensive as the cloud TPUv2. With increased batch size, it looks like two V100s would have performance comparable to TPUv2.

Volta V100 already has "tensor cores" which are basically little matrix multiplication ASICs.
That's what I said.

The microarchctiecture has many unnecessary things and it's not optimized as a whole for deep learning.

I believe it was either the last MICRO* or the one before that when Dally addressed this point. The specialized hardware for graphics ends up comprising such a small portion of the overall chip that it wasn't worth it to remove it. The "GPUs were made for graphics thus aren't good for DL" argument really doesn't hold a lot of water IMO.

* It might've been a difference conference now that I think of it.