Hacker News new | ask | show | jobs
by shaklee3 2976 days ago
I'm not sure what you mean by google does the entire stack. Nvidia writes all of the major CUDA libraries used behind the scenes in the NN libraries, such as cuDNN, cuBLAS, etc. Nvidia can likely improve their hardware significantly faster/more efficiently than Google can because their entire business depends on it. Google has incentive for improving their TPU for internal use, but they don't make any money by selling TPU time on GCP yet.
2 comments

> I'm not sure what you mean by google does the entire stack.

Consider that Google has some of the best machine learning researchers, compiler engineers, hardware engineers, and infrastructure in the business working on this.

Huh? Machine learning and infrastructure Engineers, yes. Compiler and Hardware engineers? No. What gives you reason to believe they have a lead in either of those departments other than they have a lot of money? They're forced to use the same foundry as Nvidia, and their Hardware team is likely significantly smaller.
Google been buying up AI resources well before anyone else and has the strongest and deepest team at this point.

It is why so many of the break throughs have come from Google. Great example is winning at Go almost a decade earlier than anyone thought possible.

They probably two of the strongest teams with one the Brain team and then the Deepmind team. But all the other engineers and infrastructure is first rate at Google.

Really at this point do not think the $100B cash is as important as Google already built the team and now experinced resources are far more difficult to get.

The other advantage for Google is their ability to attract the top engineers in addition.

Google just got started a lot earlier on all of this.

Google got started a lot earlier on this? Did you read what you are saying? Nvidia has been making hardware longer than Google has been a company. No, Google does not have a better hardware team. Google has the luxury of making a device that is used for a single purpose that they control. Nvidia made a device that can be used for far more and works on commodity hardware. By the way, deepmind/alphago uses Nvidia GPUs, so that was an extremely bad example.
BTW,. Deepmind now uses TPUs both for training and inference and with the results we can see why.

https://www.theverge.com/circuitbreaker/2016/5/19/11716818/g... Google reveals the mysterious custom hardware that powers AlphaGo

Hardware optimize for NN. Nvidia dominate focus had been graphics. Big difference which we can see the results in this article.

Plus benefits not having the baggage that Nvidia would have.

But never going to be able to use a TPU for graphics.

In the end it is about results.

Tensor cores are hardware optimized for NN. You call it baggage, Nvidia calls it extra revenue. Because some people need double precision, and those people are willing to pay a lot of money. So the V100 continues to be the cheapest way to train and do inference on NN because you can actually amortize the server cost over time. With tpu, you pay the hourly price forever. TPU are better only in the case of NN jobs that are short in length or you don't have the capital to buy a server. Anything longer, you can buy a Titan v and come out far ahead.

By the way, the Tesla cards have no graphics output, so I'm sure why you'd say they have graphics baggage.

Google does the applications at scale and then each layer below and a big one is TF. A great example is the recent release of the new text to speech using NN.
When you use a Google service that uses the TPUs they are indirectly selling the TPUs.