| > Google has over 4k production NN and the amount of money they save running them at less than 1/2 the cost for their own stuff versus using Nvivida is a huge amount of money. Yes, Google has a large ML deployment, but so does Nvidia, which is not (currently) focused on direct-to-consumer public APIs, but actually doing deep learning and simulation at scale. The hyperscaler approach to ML is not the only possible way to scale up, Nvidia chose to go the HPC/supercomputing route and basically built their own supercomputer from the ground up. Both approaches have their advantages and drawbacks, but one thing that supercomputing approaches have is a focus on vertical scalability. It's not just about samples/second, but how big can you feasibly make and train an NN? Note that the national research labs are getting into the act, and those supercomputers are basically built in close collaboration with Nvidia [1]. I would really recommend spending some time on their website and watching some of their videos, e.g. [2]. Jensen Huang is completely bought into deep learning and NN and has re-oriented its company towards making sure Nvidia can dominate the space. > They have the data to iterate and Nvidia just does not This is where I fundamentally disagree with you. This was true 3 years ago, but not today, mostly because Nvidia is the default option for ML researchers right now and they are slowly but steadily enticing everyone to collaborate with them (not to mention their self-driving efforts, which generate troves of data directly). > Just take their text to speech sold as a service and the cost of doing on Nvidia would have been prohibitive. That's on their own deployment. Google is #3 in the cloud space right now. It's Nvidia-powered AWS + Azure ML deployments competing against Google, which also deploys V100s as well as TPUs. Although it's possible for a single vertically integrated player to beat the rest of the market (e.g. Apple) for a long period of time, it's a difficult, risky proposition and it usually helps if they started out with a huge advantage, which Google doesn't seem to have since they're starting at #3 in the cloud space. > hey do not have the resources to compete. That is the exact problem. I think, perhaps, you are still imaging the company as it was in 2012 or 2015, but the company's revenues and profits have grown substantially in the past years. Nvidia's market cap is $132bn and they have a profit run rate of about $4bn - 5bn / yr. Their R&D spend has averaged about $2bn / yr for the last 5 years or so; in fact they beat AMD/ATI into the ground while spending less on R&D. They can basically triple the amount of money their pour into research if they wanted to. By comparison, Google spends about $15bn/yr on R&D, but that's split across far more projects. > Google does the entire stack and Nvidia does NOT. I'm going to have to strongly disagree with you on that one. Google owns more of the deep learning end-to-end cloud stack, but they do not own more of the hardware, software, or firmware stack for accelerated computing. Which 'ecosystem', the easier access to data (which Google does have) vs. controlling the hardware + frameworks + partnerships, is an open question. I tend to believe the latter, because Nvidia has many options to get its hands on data (they can partner with the other cloud providers), while Google would have to invest quite a bit to compete on Nvidia's terms. The easiest example, which I keep coming back to and which you haven't addressed, is how is Google going to compete on memory fabric and node architecture? Nvidia is out there building NVLink, NVSwitch, and basically their own supercomputing nodes (DGX-2). They are working ORNL to build some of the largest Volta deployments in the world, so they are rapidly building experience on doing deep learning at large scale as well. How would Google be able to match this if NN/DL development turns out to scale vertically (and we are seeing this in rapid YoY growth of layer depth and network size in DL). Again, TF is not really a direct advantage for Google because it runs equally well on Nvidia hardware. If Google is so confident in the TPU winning out, why are they busy deploying Voltas in GCE? If you want to do deep learning today, Nvidia is the go-to option because every deep learning framework is on CUDA, including cuDNN. If I want to use the TPU, I am stuck using GCE + Tensorflow (although Keras / PyTorch may soon have support), but with Nvidia I have the choice between every single cloud provider or my own local deployment, which is always ultimately cheaper than paying for cloud time. Google seems unlikely to sell you a TPU for your own DL box. > Google, Amazon, FB and other big players will do their own silicon It's certainly an interesting space now. MSFT is busy buying FPGAs from Xilinx and Intel/Altera as part of their strategy. Ultimately though, you seem to think that Nvidia is still a niche GPU maker from 2013 or so; it's not, it is larger than Tesla and certainly has more than enough funding, plus a very focused execution team and CEO. > Google could never be what they are today if they had not built their own stuff. Could you imagine the cost of using SAN instead of them creating GFS? I agree that the hyperscalers found significant savings by looking up the stack, but that has limits. They aren't building their own CPUs, for example. Chipmaking is a very, very expensive game. [1] https://insidehpc.com/2018/01/using-titan-supercomputer-acce... [2] https://www.youtube.com/watch?v=Rn73n1HYYNs |
Not even sure where they are hosting them or even what they do? How about some color as you have me curious?
Have watched videos of Jensen but also watched an excellent almost 2 hour presentation from one of their VPs. He said a lot of things that were in the Google TPU paper which I found a bit funny. How you can use 8 bits and integers for inference for example. Said to me these guys are trying to catch up.
The problem is Amazon has that data NOT Nvidia. It is not in Amazon best interest to help Nvidia this is my exact point. The entire dynamics of the chip business have changed. You will see Amazon do their own just like Google has.
Once Google did the gen 1 TPUs they set the direction of you just can NOT buy off the shelve and compete long term.
The silicon is strategic for AI.
MS went the wrong direction in using a FPGA solution in addition to using Nvidia. But once again no data for Nvidia.
Market cap does not give you the money. But Google in 2018 will spend about 2x Nvidia 2017 sales! Yes you read that correct. Google on R&D will spend 2x Nvidia 2017 sales!
Google profits will be over 4x Nvidia total 2017 sales.
Once again Nvidia does NOT do the entire stack. I am not aware of any algorithm breakthroughs that came from Nvidia. I can not even name one AI expert Nvidia.
But the score board is papers excepted at NIPS. Nvidia did NOT get a single paper accepted that I saw at the conference?
Versus Google had more than anyone. 9% of all the paper accepted came from Google.
https://medium.com/machine-learning-in-practice/nips-accepte...
If Nvidia is playing in the entire stack how could they NOT get a single paper accepted at NIPS?
Or did I miss it?
If we look at Self Driving cars one of the most important AI applications Nvidia does not even show up on patents? Once again Google ahead by a mile.
https://www.theatlas.com/charts/r1iEkmKkz
Something in your post does NOT add up? Why if Nvidia is a player in the stack besides the silicon why do they NOT show up any where?
Google deploys both, TPUs and Nvidia, for a number of reasons I suspect.
The biggest is they want TF to be the canonical framework for AI and they MUST show not favoring their own solution until it is a done deal which is getting close.
In the TF will never run as well on Nvidia as they will on the TPUs. We can see hit here with about 1/2 the cost using the TPUs over Nvidia.
It is like saying Android would run as well as iOS on the Apple processors. It is all about controlling the entire stack like Apple has done and Nvidia is just not in a position to be able to.
Makes no sense to buy the processors so would not make any sense for Google to sell them to others. Not going to ever see that happen.
But I do think it is possible Google will sell the PVCs.
The ultimate problem is Nvidia is in perceptual catching up. Right now the big new thing that came from Hinton is Capsule networks and using dynamic routing. Google will have that optimized in silicon long before Nvidia will.
I suspect it will create the need for a different approach how you access memory in chip architecture.
But Capsule networks are heavy computationally and so silicon will matter a lot. Google has the algorithms and how they want to use in production at scale and then the money to execute in supporting in silicon. They just move way too fast for Nvidia to ever be able to catch up.