Hacker News new | ask | show | jobs
by smallnamespace 2973 days ago
> Google has over 4k production NN and the amount of money they save running them at less than 1/2 the cost for their own stuff versus using Nvivida is a huge amount of money.

Yes, Google has a large ML deployment, but so does Nvidia, which is not (currently) focused on direct-to-consumer public APIs, but actually doing deep learning and simulation at scale.

The hyperscaler approach to ML is not the only possible way to scale up, Nvidia chose to go the HPC/supercomputing route and basically built their own supercomputer from the ground up.

Both approaches have their advantages and drawbacks, but one thing that supercomputing approaches have is a focus on vertical scalability. It's not just about samples/second, but how big can you feasibly make and train an NN? Note that the national research labs are getting into the act, and those supercomputers are basically built in close collaboration with Nvidia [1].

I would really recommend spending some time on their website and watching some of their videos, e.g. [2]. Jensen Huang is completely bought into deep learning and NN and has re-oriented its company towards making sure Nvidia can dominate the space.

> They have the data to iterate and Nvidia just does not

This is where I fundamentally disagree with you. This was true 3 years ago, but not today, mostly because Nvidia is the default option for ML researchers right now and they are slowly but steadily enticing everyone to collaborate with them (not to mention their self-driving efforts, which generate troves of data directly).

> Just take their text to speech sold as a service and the cost of doing on Nvidia would have been prohibitive.

That's on their own deployment.

Google is #3 in the cloud space right now. It's Nvidia-powered AWS + Azure ML deployments competing against Google, which also deploys V100s as well as TPUs.

Although it's possible for a single vertically integrated player to beat the rest of the market (e.g. Apple) for a long period of time, it's a difficult, risky proposition and it usually helps if they started out with a huge advantage, which Google doesn't seem to have since they're starting at #3 in the cloud space.

> hey do not have the resources to compete. That is the exact problem.

I think, perhaps, you are still imaging the company as it was in 2012 or 2015, but the company's revenues and profits have grown substantially in the past years.

Nvidia's market cap is $132bn and they have a profit run rate of about $4bn - 5bn / yr.

Their R&D spend has averaged about $2bn / yr for the last 5 years or so; in fact they beat AMD/ATI into the ground while spending less on R&D. They can basically triple the amount of money their pour into research if they wanted to.

By comparison, Google spends about $15bn/yr on R&D, but that's split across far more projects.

> Google does the entire stack and Nvidia does NOT.

I'm going to have to strongly disagree with you on that one.

Google owns more of the deep learning end-to-end cloud stack, but they do not own more of the hardware, software, or firmware stack for accelerated computing.

Which 'ecosystem', the easier access to data (which Google does have) vs. controlling the hardware + frameworks + partnerships, is an open question. I tend to believe the latter, because Nvidia has many options to get its hands on data (they can partner with the other cloud providers), while Google would have to invest quite a bit to compete on Nvidia's terms.

The easiest example, which I keep coming back to and which you haven't addressed, is how is Google going to compete on memory fabric and node architecture? Nvidia is out there building NVLink, NVSwitch, and basically their own supercomputing nodes (DGX-2).

They are working ORNL to build some of the largest Volta deployments in the world, so they are rapidly building experience on doing deep learning at large scale as well. How would Google be able to match this if NN/DL development turns out to scale vertically (and we are seeing this in rapid YoY growth of layer depth and network size in DL).

Again, TF is not really a direct advantage for Google because it runs equally well on Nvidia hardware. If Google is so confident in the TPU winning out, why are they busy deploying Voltas in GCE?

If you want to do deep learning today, Nvidia is the go-to option because every deep learning framework is on CUDA, including cuDNN. If I want to use the TPU, I am stuck using GCE + Tensorflow (although Keras / PyTorch may soon have support), but with Nvidia I have the choice between every single cloud provider or my own local deployment, which is always ultimately cheaper than paying for cloud time. Google seems unlikely to sell you a TPU for your own DL box.

> Google, Amazon, FB and other big players will do their own silicon

It's certainly an interesting space now. MSFT is busy buying FPGAs from Xilinx and Intel/Altera as part of their strategy. Ultimately though, you seem to think that Nvidia is still a niche GPU maker from 2013 or so; it's not, it is larger than Tesla and certainly has more than enough funding, plus a very focused execution team and CEO.

> Google could never be what they are today if they had not built their own stuff. Could you imagine the cost of using SAN instead of them creating GFS?

I agree that the hyperscalers found significant savings by looking up the stack, but that has limits. They aren't building their own CPUs, for example. Chipmaking is a very, very expensive game.

[1] https://insidehpc.com/2018/01/using-titan-supercomputer-acce...

[2] https://www.youtube.com/watch?v=Rn73n1HYYNs

1 comments

Not aware of Nvidia having any where the number of neural networks in production or the nearly the number of users.

Not even sure where they are hosting them or even what they do? How about some color as you have me curious?

Have watched videos of Jensen but also watched an excellent almost 2 hour presentation from one of their VPs. He said a lot of things that were in the Google TPU paper which I found a bit funny. How you can use 8 bits and integers for inference for example. Said to me these guys are trying to catch up.

The problem is Amazon has that data NOT Nvidia. It is not in Amazon best interest to help Nvidia this is my exact point. The entire dynamics of the chip business have changed. You will see Amazon do their own just like Google has.

Once Google did the gen 1 TPUs they set the direction of you just can NOT buy off the shelve and compete long term.

The silicon is strategic for AI.

MS went the wrong direction in using a FPGA solution in addition to using Nvidia. But once again no data for Nvidia.

Market cap does not give you the money. But Google in 2018 will spend about 2x Nvidia 2017 sales! Yes you read that correct. Google on R&D will spend 2x Nvidia 2017 sales!

Google profits will be over 4x Nvidia total 2017 sales.

Once again Nvidia does NOT do the entire stack. I am not aware of any algorithm breakthroughs that came from Nvidia. I can not even name one AI expert Nvidia.

But the score board is papers excepted at NIPS. Nvidia did NOT get a single paper accepted that I saw at the conference?

Versus Google had more than anyone. 9% of all the paper accepted came from Google.

https://medium.com/machine-learning-in-practice/nips-accepte...

If Nvidia is playing in the entire stack how could they NOT get a single paper accepted at NIPS?

Or did I miss it?

If we look at Self Driving cars one of the most important AI applications Nvidia does not even show up on patents? Once again Google ahead by a mile.

https://www.theatlas.com/charts/r1iEkmKkz

Something in your post does NOT add up? Why if Nvidia is a player in the stack besides the silicon why do they NOT show up any where?

Google deploys both, TPUs and Nvidia, for a number of reasons I suspect.

The biggest is they want TF to be the canonical framework for AI and they MUST show not favoring their own solution until it is a done deal which is getting close.

In the TF will never run as well on Nvidia as they will on the TPUs. We can see hit here with about 1/2 the cost using the TPUs over Nvidia.

It is like saying Android would run as well as iOS on the Apple processors. It is all about controlling the entire stack like Apple has done and Nvidia is just not in a position to be able to.

Makes no sense to buy the processors so would not make any sense for Google to sell them to others. Not going to ever see that happen.

But I do think it is possible Google will sell the PVCs.

The ultimate problem is Nvidia is in perceptual catching up. Right now the big new thing that came from Hinton is Capsule networks and using dynamic routing. Google will have that optimized in silicon long before Nvidia will.

I suspect it will create the need for a different approach how you access memory in chip architecture.

But Capsule networks are heavy computationally and so silicon will matter a lot. Google has the algorithms and how they want to use in production at scale and then the money to execute in supporting in silicon. They just move way too fast for Nvidia to ever be able to catch up.

I'd really like to hear what you think about Nvidia's approach to self-driving, especially using supercomputing + simulation + backtesting to bootstrap the process. We keep going back and forth on this topic, but how can you develop a self-driving platform without a bunch of NNs in production, running on the Nvidia Saturn V supercomputer?

> How you can use 8 bits and integers for inference for example. Said to me these guys are trying to catch up.

I think it's interesting that you presume that only Google came up with the idea first, rather than 'reducing precision' to be a rather obvious idea that any chip designer or ML practitioner would have brought up. Again, can you please justify that?

I think where we're at a disconnect is that you equate AI leadership with publishing and patents, while looking at Nvidia, they are an extremely secretive organization that would probably avoid publishing what they see as a competitive advantage. This is similar to how Apple operates.

I used to work at finance, and the culture was the same way—banks had state-of-the-art models internally but would never share it. Published papers in academia were probably ~5 years behind what the banks had.

I do believe that Google (mostly Deepmind) is the leader in the research field, but note that they had to go out and buy that expertise.

> Google on R&D will spend 2x Nvidia 2017 sales!

Yes, but it's not all going into AI for sure, and definitely not into bankrolling the TPU effort. We should compare apples to apples here, surely?

> The entire dynamics of the chip business have changed. You will see Amazon do their own just like Google has.

So what about Nvidia's self-driving efforts? I've talked about it for about 3-4 posts now, with references to presentations and videos, and heard more or less crickets from you about it. I don't see how you can repeatedly say that Nvidia has no access to data when they clearly have a working product (Drive PX2) already, plus more (Drive Xavier) ready to be deployed in cars within the next ~18 months.

> Google deploys both, TPUs and Nvidia, for a number of reasons I suspect.

> The biggest is they want TF to be the canonical framework for AI and they MUST show not favoring their own solution until it is a done deal which is getting close.

Yes, but for those exact same reasons, the TPU will not be a strategic edge for Google and lower the ROI of working on the project.

You can't have it both ways: either the TPU is the secret sauce that drives Google Cloud adoption and gives them a big leg up in AI (in which case, they would want to leverage TF and make it 'run better' on the TPU than on other hardware), or else TF is a neutral platform and it doesn't benefit either party (which I actually agree with).

> It is like saying Android would run as well as iOS on the Apple processors. It is all about controlling the entire stack like Apple has done and Nvidia is just not in a position to be able to.

I think the analogy here is really apt, but also shows why I don't believe in Google's success here long-term.

The iPhone basically invented the smartphone market; its product was 10x better than any other competitor when it was introduced, and it was probably the majority of volume (and definitely profit) for years before Android was able to compete.

The TPU is not heads and shoulders above the competition. The Volta came out literally ~1 year after Pascal and had 10X the tensor throughput; you say that Google isn't standing still, but certainly neither will Nvidia.

Basically, Google is not starting from a 'commanding lead' position like Apple did. And we see today that even though Apple still leads in profits, Samusng is very close, and, Android is the vast majority of the market.

Larger ecosystems tend to beat fully vertical stacks in the long term. We see this across many markets and products. So why do you think this will be the exception?

It is good to see Nvidia trying to create a virtual world like Google has. But the problem is Google has the real-life experience to use with their virtual California.

But honestly Nvidia is so far behind in SDC and without any patents it is hard to see them competing.

https://www.theatlas.com/charts/r1iEkmKkz

Yes obvious would agree. But Google implemented late 2014 and Nvidia did NOT in 2014 or 2015 or 2016 as far as I am aware?

AI is not a secretive area. So if Nvidia had something we would know. On the lack of patents puts them in a very weak position. Especially with SDC.

" but note that they had to go out and buy that expertise."

This is one of the more stupid things I have read in a bit on the Internet.

In late 90s Larry Page was asked about using AI to make search better. He shared we are doing search to make AI better.

TPUs did NOT even come from Deepmind. But honestly knowing what to buy is important. But Google is miles ahead of everyone without Deepmind. TF also did NOT come from Deepmind. So many other things. I would actually say the Brain team has done a lot more in actual production than even DeepMind. But DeepMind is Google and rather dumb comment, no offense. How old are you?

Yes the TPUs are very strategic. It is how they were able to do AlphaZero. Or more importantly their new Speech offering at a reasonable cost. Without the TPUs that would not be possible and that is a strategic advantage for Google and why Amazon and everyone else will copy.

Buying off the shelve can NEVER give you a strategic advantage.

Google does NOT run their inference at scale on Nvidia. But also training has been moving to TPUs quickly for Google. They offer a choice but with the TPUs half the price as it shows how much better they are.

You want to get TF to be the canonical solution and then use your fundamental advantages. Just business 101.

"The TPU is not heads and shoulders above the competition. "

We can see the TPUs are heads and shoulders better. Heck they are half the cost. Much bigger advantage than the iPhone. But more importantly they will improve far faster than anything from Nvidia.

"Basically, Google is not starting from a 'commanding lead' position like Apple did. "

Google lead in AI is much, much larger than any of Apple. Heck Apple market share is about 14% and Google with Android has over 80% market share.

"Larger ecosystems tend to beat fully vertical stacks in the long term. "

There is no Nvidia eccosystem that I am aware of? The AI eccosystem is built around TF.

I've been nothing but unfailingly polite to you, and I'm getting tired that you repeatedly resort to name-calling and insults when you encounter an opposing opinion.

> But DeepMind is Google and rather dumb comment, no offense. How old are you?

Why are you getting all worked up? That's not an insult to Google, simply pointing out that their own organically grown corporate org (including Brain) was not adequate to do the cutting-edge research they felt they needed.

> AI is not a secretive area. So if Nvidia had something we would know. On the lack of patents puts them in a very weak position. Especially with SDC.

I disagree. Think about this the other way; if some company was quietly plugging away with large AI advances and deciding not to publish them, how would you even know? My evaluation of Nvidia's technology is based on their public presentations and the products that have already been released—products that every single AI practitioner on the planet buy and use, plus the 150+ car ecosystem partners that have decided to go with Nvidia's driving platform [1].

People who are far more deeply enmeshed in this technology than you or I have voted with their feet and decided to build their core competency for the next 5+ years on Nvidia's platform, while Waymo has maybe 2-3 major automotive partners?

> Google lead in AI is much, much larger than any of Apple. Heck Apple market share is about 14% and Google with Android has over 80% market share.

Bottom line, I definitely agree that Google is an AI leader, but I do not believe that the AI future will be run on TPUs, for the simple reason that chipmaking is a risky, expensive endeavour, and Nvidia has much more expertise than Google does in that regard, while having access to a larger partnership, ecosystem, and its own set of data and engineering.

Put it this way, the actual chipmaking stack is more important than the data stack when it comes to making chips. Just think about it—let's say you've run your thousands of NNs to benchmark the workloads on the TPU, and it turns out that CPU-TPU and TPU-TPU bandwidth is the real bottleneck. What do you do as Google? They have no expertise in building interconnects and scaling them, while Nvidia does.

Data only gets you so far, you still need to be able to do the semiconductor engineering + create partnerships, and in that regard Nvidia is light-years ahead.

To belabor the point, if the goal is to make chips, then being good at chipmaking is very important, and Nvidia is closer to Google in data, than Google is to Nvidia in chipmaking.

I will bet you that 2 years down the line, Nvidia will have abandoned its own TPU project and all major players just buying Nvidia chips, both for inferencing and training.

This is exactly the role Intel plays today in CPUs, and it's both natural and reasonable, and the largest reason is because of structural market factors, which you have never even responded to.

Google's cloud is a fraction of the size of AWS and Azure's — that means Nvidia makes far more money from Voltas than Google will ever save on the TPUs, and plough that right back into additional R&D. Business people demand a positive ROI. Where will the positive ROI from a TPU come from?

Google is and will be an AI leader. But it will certainly not be doing its own chips.

[1] https://www.nvidia.com/en-us/self-driving-cars/partners/

What name calling? Totally against name calling.

Deepmind is Google but only one aspect. Much of the best research does not even come from the Deepmind unit.

Nvidia had zero papers at NIPS accepted. Plus GANs, Capsule networks, AlphaGo and so many other breakthroughs come from Google.

But maybe I am just unaware. Can you provide some breakthroughs from Nvidia? Maybe I am just unaware?

SDC will be winner take all and Waymo is literally miles ahead of everyone else.

We have recent benchmarks done on Nvidia versus the TPUs and the TPUs are about 1/2 the price of using Nvidia for the same amount of work. That is a big advantage for Google.

But also that was gen 2 and suspect we will see a gen 3 soon which will be another step forward. Nvidia will constantly be trying to catch up.

"I will bet you that 2 years down the line, Nvidia will have abandoned its own TPU project and all major players just buying Nvidia chips "

This does not make sense to me and think you had a typo?

BTW, Unless Nvidia makes major advancement Google just could never use Nvidia for their own stuff. The cost would just be way too high. Perfect example is the new Google text to speech using a NN at 16k samples a second. There is just no way Google could have used Nvidia and offer at a competitive price. The joules per inference is just way too expensive with Nvidia.

Google would have loved to buy chips for their stuff from Nvidia. Problem is they just do not have anything they could use at a price they could offer at scale.

So unless Nvidia catches up you will not see Google use Nvidia for their services.

The big new advancement I suspect we will see with Gen 3 is different memory architecture to better support dynamic routing with Capsule networks which came from Hinton.

Then it will be a couple of years before we see the same from Nvidia.

BTW, what is different is nobody is going to be tied to any chip architecture like we had with Intel. Those days are gone. The common layer will be TF. Now has over 98k stars on GitHub.

Besides K8s what else got to 100k stars faster.

> But maybe I am just unaware. Can you provide some breakthroughs from Nvidia? Maybe I am just unaware?

You must be joking. I've only been pointing you repeatedly to Nvidia breakthroughs, with references and links provided, for the last 5 posts. Either you are blind, or willfully ignorant.

More self-driving accidents are only going to accelerate the pace. 3 years from now the government will be requiring auto makers to use the Drive Constellation [1] for safety testing.

Google has nothing remotely comparable, neither in published research nor in announced products.

We've gone through something like 8 replies and thousands of words, and you have written exactly one sentence addressing Nvidia's developments.

> The joules per inference is just way too expensive with Nvidia.

The Drive Xavier is basically a giant inferencing engine on a power-constrained platform [2]. It will be shipping in quantity in 2019. There is no equivalent Google product even announced.

> Now has over 98k stars on GitHub.

The fact that you are resorting to GitHub stars to make your argument is utterly laughable.

CUDA and cuDNN is the real enabler, which every single DL framework today (including TF) supports. People trust Nvidia far more than they trust Google to be an ecosystem partner.

> Unless Nvidia makes major advancement

Looked through your social media posts, half of them are pro-Google fanboism.

Nvidia's makes major advancements every 6 months across the entire deep learning stack. I keep pointing you towards what they're doing, hoping you have something interesting to say, but all you have to offer is the same tired Google cheerleading.

Look, I'm a long-time investor in both companies and like them both very much. But it's quite obvious you have zero interest in doing even a smidgen of research about Nvidia nor their technology.

Anyway, thanks for the replies, but I'm no longer interested in continuing this convo. You don't seem to know anything relevant at all about Nvidia, nor are you interested in learning more, despite all attempt to point you towards interesting things that they're doing, and why their approach is unique.

[1] https://www.youtube.com/watch?v=lVlqggTiTzY

[2] https://www.engadget.com/2018/01/07/nvidia-xavier-soc-self-d...