Hacker News new | ask | show | jobs
by jacksmith21006 2976 days ago
Hardware optimize for NN. Nvidia dominate focus had been graphics. Big difference which we can see the results in this article.

Plus benefits not having the baggage that Nvidia would have.

But never going to be able to use a TPU for graphics.

In the end it is about results.

1 comments

Tensor cores are hardware optimized for NN. You call it baggage, Nvidia calls it extra revenue. Because some people need double precision, and those people are willing to pay a lot of money. So the V100 continues to be the cheapest way to train and do inference on NN because you can actually amortize the server cost over time. With tpu, you pay the hourly price forever. TPU are better only in the case of NN jobs that are short in length or you don't have the capital to buy a server. Anything longer, you can buy a Titan v and come out far ahead.

By the way, the Tesla cards have no graphics output, so I'm sure why you'd say they have graphics baggage.

The problem for Nvidia is they do NOT do the entire stack. So Google has the ability to better optimize and here we are seeing those results as using TPUs is about 1/2 the price of Nvidia hardware.

Baggage is a company thing. Google really has been an AI company since in the late 90s when Larry Page was asked about using AI to improve search and he replied he was using search to make AI happen.

Ha! When you amortize you are still spending money and you saying this really bothers me and is such a problem.

Too many look at things like you do and why companies get into problems. Capitalizing is not magic.

BTW, Google is also going to be able to iterate much quicker as the AI breakthroughs happen and come out with new versions that should stay well ahead of Nvidia.

The dynamics of the chip business have changed. Use to be companies bought chips from someone and then put them in to servers and sold the servers.

The problem is the company making the chips are NOT running the chips and do not have any skin in the game or the data needed to improve.

Now we have companies like Google making the chips and also running the chips and why we see power footprint being the focus far more than the past.

We will see all the big operations including Amazon make their own chips more and more.

A perfect example if Capsule networks replacing some uses of CNNs. Google with Hinton developed the Capsule network approach and will be supporting it far faster then you will see from Nvidia.

Then there is the canonical framework for AI being TF.

All of this was theoretical advantageous for Google and now we get to see they appear to be real with the pricing of the TPUs being about half of the cost of using Nvidia.

You still haven't given a single example of what you mean by "doing the entire stack". I'm assuming that's because you don't have one?

You seemed to have completely missed why Nvidia's stock has gone up 17x in 4 years while google only 3x. The dynamics of the chip business have not changed; you are focusing on a single market, DNN, which is a small piece of the entire science/engineering community. Google made a chip that accelerates DNN. They also chose not to make an API to use that hardware with outside TF. So if you could buy a tpu and put it in your own server, it would beat the V100 in performance/watt. You can't do that, so nvidia wins, because I can buy a V100, and in 51 days the price I bought it for ($8K) has already been burned through in GCP. If you need me to do the math to help you realize that now your only recurring cost on the v100 (power) is more than 100x less than the TPU, I can do that for you. But hopefully you understand now that the TPU is for a niche market outside of google, and it will never be a large source of revenue for them at $6.50/hour.

TF is not exclusive to google. Nvidia has engineers working on TF.

Your capsule example is again extremely poor. You think google can respin an asic quicker than nvidia? Not only does history say the exact opposite, but they both use TSMC.

> Nvidia's stock has gone up 17x in 4 years while google only 3x

Not sure the market cap or the P/E are apples to apples there.

Also:

> https://www.cnbc.com/2018/02/23/secretive-chinese-bitcoin-mi...

Not sure why I could not reply to your post so will reply here.

Find the questioning on the entire stack just baffling with Google.

People - Google has the strongest team of AI experts in the industry by a wide margin. At NIPS this year Google had more papers excepted than anyone else. The big AI breakthroughs come from Google. They solved Go a decade earlier than anyone thought possible. Would put FB #2 with AI experts but a very distant #2.

Google miles ahead with SDC.

Plus Google is able to attract the top talent better than anyone else.

https://unsupervisedmethods.com/nips-accepted-papers-stats-2...

Applications - Search, Photos, Speech, AlphaZero, Self Driving Cars, Google now has over 4k NN in production. Nobody else even in the ball park. Hand down the leader in applications.

Infrastructure - Tensor Flow now has 98k stars on GitHub. It is the canonical AI framework in the industry and really nothing else close. CNTK is #2 with 14k stars. But Google ads about 7x per days stars.

https://github.com/tensorflow/tensorflow

Then there is Google cloud infrastructure and their other engineering talents which are well ahead of anyone.

I can go on but this is so incredibly silly. There is little question that Google is leading at every layer of the AI stack by a wide margin.

This is so silly I suspect something else going on here. We do not seem to be discussing things based on reality.

Is this about Damore?

BTW, Nvidia can only spin up what they know about. Google does not share everything but luckily they do a lot for Nvidia.

In 2018 you just have to run the infrastructure to be long term viable in the chip game.

Nobody is arguing Google is the best at AI. You are arguing that translates to Google is the best at making chips. They aren't, and there's no evidence they are. Tensorflow is open source, and Nvidia contributes. Would you be willing to place a bet that more people run tensorflow on tpu or GPU?

Edit: and there are FAR more cuda users in general than tensorflow if you're trying to compare apples and oranges.

Well we have a data point that suggests they created a better chip. But just makes sense as they do the entire stack and that gives you the info they need to build a better chip.

Look a capsule networks and dynamic routing. That potentially drives a different architecture and Google has thousands of production models to use to optimize that Nvidia just does not have.

Plus it is one company so no IP issues.

But the biggy is we can see half the cost.

No, you have a data point that says they created a chip that performed better on a single test for a single domain of work. You also have a data point that says tpu can't do any 64-bit simulations. It's not half the cost. See previous comment. It's about 100x the cost after 51 days.