Hacker News new | ask | show | jobs
by jacksmith21006 2977 days ago
Thanks for sharing and very insightful. Guess the TPUs are the real deal. About 1/2 the cost for similar performance.

Would assume Google is able to do that because of the less power required.

I am actually more curious to get a paper on the new speech NN Google is using. Suppose to be 16k samples a second through a NN is hard to imagine how they did that and was able to roll it out as you would think the cost would be prohibitive.

You are ultimately competing with a much less compute heavy solution.

https://cloudplatform.googleblog.com/2018/03/introducing-Clo...

Suspect this was only possible because of the TPUs.

Can't think of anything else where controlling the entire stack including the silicon would be more important than AI applications.

4 comments

Half the cost? Where are you reading that? Yeah on demand rental in AWS is expensive, but both long term and buying V100 yourself is significantly cheaper. Cloud companies have pretty fat margins on on demand rentals.

You can’t buy a TPU, it’s a cloud only thing. They also show it’s not a huge difference in both perf and time to converge (albeit only one architecture)

I would say kudos to V100 and this benchmark that breaks the TPU hype.

The chart has 6.7 per hour for 3186 images Google and 12.2 per hour for 3128 AWS.

Or maybe reading it wrong?

That is close to half has much to use Google is it not?

BTW, The TPUs are also about twice as fast also.

Sounds like Google is pretty far ahead of Nvidia. Which really just makes sense as Google does the entire stack and just going to have the data to optimize the silicon.

About half the cost is hype?

I want in the cloud and not have to deal with updating, etc. Would think most are the same for anything of any scale. Could not imagine any longer building up rigs and dealing with all the issues. Plus much harder to scale.

It's more a comparison of AWS vs. Google Cloud pricing than Nvidia vs. TPUv2.
Strongly disagree. If Google is able to offer at about 1/2 the cost using their own silicon versus AWS using Nvidia that is all about the silicon difference.

But we also have the V1 TPU paper and can see the TPUs are able to use less joules per inference compared to an older Nvidia architecture. Was not that close. Just makes sense Google V2 TPUs would do the same.

Hope Google does a V3 TPU and then will share a V2 TPU paper like they did on V1 of the TPUs.

What is far more impressive of the TPUs is

https://cloudplatform.googleblog.com/2018/03/introducing-Clo...

If really doing 16k a second through a NN and at a price you can offer generally now that is incredible. I want this paper even more so.

What makes you so sure it is all the silicon difference and not just AWS pricing their product at a more profitable price point?

These costs also ignore transferring and storing massive data sets in the cloud. In general the cloud is a huge pain and I'd avoid it like the plague unless I was caught and really, really needed the scalability. But even then that only works if you have a scalable implementation of the algorithm you are working on.

Maybe, maybe not. They have the advantage that they make the hardware, so they're not paying as much retail as nvidia is charging them for their cards. I don't think there's any way you can say the TPU is cheaper compared to buying your own system. If Google decides to release it to the public, that's a different story. Also, keep in mind that Google allows you to mix and match the CPU core count to GPU, whereas AWS doesn't. It's possible that the Google cloud price with fewer CPU cores will be much cheaper than the AWS instance.
That is true. But the cost of running is where all the cost is at really not so much in making the chips.

Yes I can say it is a lot cheaper. That is what this article is all about.

You can do about twice the images per dollar using the TPUs with GCP versus using Nvidia with AWS.

Or what am I missing?

BTW, Google has released to the general public. What are you talking about?

"Google’s AI chips are now open for public use"

https://venturebeat.com/2018/02/12/googles-ai-chips-are-now-...

If anything, the pricing likely benefits Google. As in Google may be more profitable with the TPU usage, even at 1/2 the cost of Amazon's V100 usage.
fwiw, the "TPU instance " has more than one tpu chip on it.
The architectures are so radically different that I don't think it makes sense to try to compare anything but the whole system performance. Trying to do a 1 to 1 comparison for a core or a chip becomes pretty nebulous because the architectures are radically different.
It has more than the chips, too, since the TPUs can't run a TCP/IP stack, gRPC server, etc.
See the chart titled: Performance in images per second per $.

TPUv2 is has 1.27x-1.86x the images/s/$.

And the other chart titled: Cost to reach 75.7% top-1 accuracy.

Where TPUv2 costs 62.5% the reserved GPU instance and 42.6% the unreserved GPU cost.

Key takeaway from the article:

> While the V100s perform similarly fast, the higher price and slower convergence of the implementation results in a considerably higher cost-to-solution.

The impression I got was opposite: TPU is not the hot shit that Google claims it is. Pricing is kind of irrelevant since they can subsidize this to create that story.
I know an engineer who prototypes GPU-like systems with FPGA and he has told me to be skeptical about performance miracles.

No matter how fast a system is on the inside you have to get data in and out of it -- at the very least to memory. SRAM takes too much area and there is a limit DRAM bandwidth despite technologies such as eDRAM and HBM. Some tasks are compute intensive, but for general tasks, a processor that is 100x faster would need 100x faster memory to really be 100x faster.

Thus advances in real-life performance are likely to be more like a factor of 2.

For training I never pay full price in the AWS cloud, rather I run interruptable instances and pay a fraction of the list price. People I know who train in the Google cloud seem to get interrupted all the time even though they are paying full price.

Inference is another story. Once you have the trained model, you will usually need to run inference many many more times than you run training and this gets more so the bigger scale you are running at. That hits your unit costs and it is where you need to pinch every penny.

> Pricing is kind of irrelevant since they can subsidize this to create that story.

Depends on how much you plan to use the hardware. If it's running near continuously, total cost of ownership is very important. Power costs can quickly dominate TCO.

At the pricing extreme, Google could make their TPUs free to use and charge elsewhere in their cloud. This shows that literal pricing is pretty irrelevant.
So could AWS/Nvidia.
AWS yes. Nvidia, not so sure. When you buy a 1080ti you are competing with gamers and miners (and maybe others). There's nothing to subsidize, in fact those cards are selling above MSRP, because they aren't selling an ecosystem but a physical card.
> When you buy a 1080ti you are competing with gamers and miners (and maybe others). There's nothing to subsidize, in fact those cards are selling above MSRP, because they aren't selling an ecosystem but a physical card.

Those cards are also irrelevant to the comparison as they can't be bought in large capacities for ML workloads. We're talking about Titan-V's and DGX-1's here.

Did you get that impression from this line in the article?

> While the V100s perform similarly fast, the higher price and slower convergence of the implementation results in a considerably higher cost-to-solution.

Full disclosure, I currently work at Nvidia on speech synthesis.

You can definitely do this on a GPU. We use the older auto-regressive WaveNets (not Parallel Wavenet) for inference on GPUs, with the newly released nv-wavenet code. Here's a link to a blog post about it:

https://devblogs.nvidia.com/nv-wavenet-gpu-speech-synthesis

That code will generate audio samples at 48khz, or if you're worried about throughput, it'll do a batch of 320 parallel utterances at 16khz.

> About 1/2 the cost for similar performance.

I would expect a dedicated accelerator to need at least a 5-10X advantage to outweigh all the other infrastructure and ecosystem costs.

GPUs are more useful for a wide variety of data-parallel tasks, and many more NN frameworks work on top of CUDA than work on the TPU.

In terms of horizontal scalability, nvidia has been rapidly iterating on increasing both memory and interlink bandwidth (including NVSwitch [1]), while each 'TPU' is actually 4 chips interconnected so likely has less upward scalability.

Also note that the tensor cores on a V100 take roughly 25-30% of the actual area. If Nvidia wanted to, they could probably easily make a pure tensor chip that beat the TPU in performance, could be produced in volume on their existing process, and also had full compatibility with their entire stack.

All in all, a 2x price/performance advantage for a hyper-specialized accelerator is basically a loss, just like how nobody installs a Soundblaster card anymore, how consumer desktops don't run discrete GPUs even though integrated graphics are a few times slower, or

[1] https://www.nextplatform.com/2018/04/04/inside-nvidias-nvswi...

If that 2x price/performance scales for all of Google's inferencing then it is definitely not a loss for them. If they can halve their running costs for inferencing then they are saving themselves a ton of money. Their TPUv2 was announced slightly before the V100 and the money savings they make by not paying Nvidia premiums probably helps. From the customer point of view, what is a GPU other than a specialised accelerator. Without more details we can't know how a TPU really compares, but if your aim is to train/run inference of Tensorflow models, then they're a really competitive product at the moment.
I agree, but chip development is an expensive business. There is nothing preventing Nvidia from immediately turning around and building a specialised ML accelerator with better software integration and higher bandwidth. For all we know they could already be working on one.
They already did two generations. Google has over $100B in the bank with less than $4B debt. So money is not an issue. It is tiny in the scheme of things.

Google has an advantage as they do the entire stack and can better optimize like we see here with half the cost.

Nvidia is actively building an entire deep learning stack internally, all the way to releasing a self-driving simulation platform which they are using to build their own self-driving software [1].

I think they are actually farther along and more aggressive about exploring deep learning use cases in production than Google today; augmenting real data with extensive simulation is really a far-reaching idea that comes directly from their gaming experience.

> So money is not an issue. It is tiny in the scheme of things.

Money of course is always an issue long term; otherwise why doesn't Google Fiber just spend tens of billions of dollars to build out its nationwide network? Because it will see negative ROI even if they succeed.

The TPU has to eventually make a real return to Google, and it won't if nvidia can spend the same amount of money and build a faster product and sell it to all the other cloud players, which I believe they definitely can.

Put another way, the TPU has to be cheaper to Google than buying nvidia GPUs after factoring in its development costs, whereas nvidia gets to amortize those dev costs over all other cloud providers and all other GPU customers. Google isn't about to sell the TPU to other cloud providers; the entire idea is to use it to drive Google Cloud adoption.

The TPU is a fine chip, but if you just look at the big picture, there is every sign that nvidia could build the same or better product for less money because it has far more synergies across the hardware and chip design stack; e.g. the TPU only has PCIe connectors, while nvidia has already worked with IBM to get NVLink into supercomputers [2]. For some workloads the TPU will likely be bandwidth-starved communicating with the CPU and main memory.

[1] https://nvidianews.nvidia.com/news/nvidia-introduces-drive-c...

[2] https://www.ibm.com/us-en/marketplace/power-systems-ac922/de...

The problem is Nvidia is never going to have the AI expertise up and down the stack like Google.

As far as I am aware Nvidia does not even run a cloud do they? Obviously never going to have the production NN that Google has.

Google now has well over 4k NN in production and not sure if Nvidia has any? Well over a billion a day are using the Google NN. That data allows Google to iterate in ways that Nvidia just never would be able to.

But this was all theory and why starting to see a little more concrete results like this where Google with their TPUs able to charge 1/2 the price of using Nvidia is value. Then we also have the paper from Google on the Gen 1.

I would guess Google is working on a gen 3. Nvidia is trying to catch a moving target but without the data. So they are behind, trying to catch up, but missing an arm.

A perfect example of this phenomenon is Capsule network pioneered by Hinton. They use dynamic routing which is potentially going to require different approach to memory access as the pattern would be different than CNN or RNN.

Today the problem is memory access and no longer instruction execution. Google nailed the low hanging fruit with the Gen 1 TPUs. They have 65536 very simple cores. Now you have to go after memory access.

Your post is all over the place so a bit hard to respond. Google Fiber was NOT about cost. It was about AT&T and other established players with some local governments making it difficult for Google to access what they needed to be able to compete.

I hate debating something with someone that is doing what you are doing. Google Fiber? Really?

"I think they are actually farther along and more aggressive about exploring deep learning"

I do a LOT of surfing on sites and can easily say this is the craziest thing I have read in a bit. You are honestly comparing Nvidia to Google? Really?

Google solved Go a decade early. Hinton did the Capsule networks and basically the farther of DL. Well made it actually work. What breakthrough came from Nvidia?

A single one?

There is so much crazy stuff in your posts this must be driven by something else and something emotional? Your points are just not based on reality. Is this really about Google firing Damore?

BTW, Nvidia read the Google Gen 1 TPU paper and why we see them doing similar things. But Google is going to move to addressing the memory access problems as that is the next area to improve. Once Google figures it out then you will see Nvidia just copy the approach like they are doing with the gen 1 TPUs.

I listened to this Nvidia presentation on YouTube and they were basically quoting the Google TPU paper. Talking about using 8 bit, integers, etc, for inference.

Google will release the gen 3 and then share a paper on the gen 2 and we will see Nvivida then try to copy that one. Nvidia always a couple of steps behind.

But I am a super curious person and can you share what this is really all about?