Hacker News new | ask | show | jobs
by Who_me 2555 days ago
Hey peeps full disclosure I work as one of Linode's RnD engineers. I want to try to get to as many of these as I can.

One of the biggest questions is why the Quadro RTX 6000? Few things:

1. Cost it has the same performance as the 8000. The difference is 8 more GB of RAM that comes at a steep premium. Cost is important to us as it allows us to be at a more affordable price point.

2. We have all heard or used the Tesla V100, and it's a great card. The biggest issue is that it's expensive. So one of the things that caught our eye is the RTX 6000 has a fast Single-Precision Performance, Tensor Performance, and INT8 performance. Plus the Quadro RTX supports INT4. https://www.nvidia.com/content/dam/en-zz/Solutions/design-vi... https://images.nvidia.com/content/technologies/volta/pdf/tes... Yes, these are manufactures numbers, but it caused us pause. As always, your mileage may vary.

3. RT cores. This is the first time (TMK) that a cloud provider is bringing RT cores into the market. There are many use cases for RT that have yet to be explored. What will we come up with as a community?!

Now with all that being said, there is a downside, FP64 aka double precision. The Tesla V100 does this very well, whereas the Quadro RTX 6000 does poorly in comparison. We think although those workloads are important, the goal was to find a solution that fits a vast majority of the use cases.

So is the marketing true to get the most out of MI/AI/Etc? Do you need a Tesla to get the best performance? Or is the Tesla starting to show its age? Give the cards a try I think you'll find these new RTX Quadros with Turning architecture are not the same as the Quadros of the past.

4 comments

If you really want low cost to compute for Deep Learning and you needs lots of compute and don't want to pay for V100s, then the AMD Vega R7 is the card for you. 700 dollars, 16GB Ram, 1TB of GPU bandwidth (higher than the V100!), works with Tensorflow (pip install tensorflow-rocm), and about 60% of the performance on resnet-50.FP64 is not fully gimped (it is halved, i think - so still quite good). Put lots of them in servers with PCI 4.0, and you can do great things. Here's a recent talk on it:

https://www.youtube.com/watch?v=neb1C6JlEXc

If you really want low cost to compute for Deep Learning and you needs lots of compute and don't want to pay for V100s, then the AMD Vega R7 is the card for you. 700 dollars, 16GB Ram, 1TB of GPU bandwidth (higher than the V100!), works with Tensorflow (pip install tensorflow-rocm), and about 60% of the performance on resnet-50.FP64 is not fully gimped (it is halved, i think - so still quite good).

Two of my colleagues use high-end AMD GPUs to train RNNs and transformers with tensorflow-rocm. There are still some nasty bugs (e.g. [1]), so it is currently not for everyone. However, given how far they have come compared to 1-2 years ago, it is very likely that in a year or so, they are a real competitor to NVIDIA for compute. That competition was long needed.

[1] https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/...

Agreed, it is not quite prime-time yet. They are trying to upstream all the ROCm stuff in TensorFlow, and when it gets into mainline and stabilizes, i agree that it has great potential for take-off - particularly from price-sensitive researchers and large companies who need huge GPU farms.
Two Questions.

I wonder if Google is in any way helping AMD in the TensorFlow and ROCm?

What happen when Intel join the GPU race in 2020. Making their own ROCm again?

This is a terrible suggestion/comparison. AMD has nowhere near the software support in the ML/AI space that Nvidia has. I wish that AMD would invest in a CUDA competitor and break Nvidia's monopoly, but that is not even close to being a reality, unfortunately.
> The difference is 8 more GB of RAM that comes at a steep premium

This is incorrect. The RTX 6000 has 24GB of VRAM and is $4000, and the RTX 8000 has 48GB of VRAM (double the amount) and is $5500. Is it worth the price increase? For a lot of people I know it is.

Also, the RTX Titan is $2500 and is identical to the RTX 6000 (at the chip level) and also with 24GB of VRAM, with the only difference being software enabling of additional H.264/5 encoding features on the Quadro. Definitely not worth the cost increase, especially for anyone doing ML.

If you reason as a consumer the RTX Titan makes a lot more sense than the RTX 6000, however datacenters are forbidden by Nvidia to use consumer cards [1], therefore their choice makes sense.

[1]: http://fortune.com/2018/01/07/nvidia-consumer-video-cards/

Except datacenter is not defined by NVIDIA in their EULA at all, and plenty of large and small datacenters continue to use "consumer cards" regardless of NVIDIA's fear mongering. I know that Tesla, OpenAI, Microsoft, Apple, and many others all continue to primarily buy primarily 2080Ti's, RTX Titans, and Titan V's since the EULA change.
How is that even legal and how nvidia gets away with that type of shit?
Companies make unenforceable claims all the time. That's why we've got courts. Theyr'e almost certainly never going to take any one to court, because if they did, it would get tossed out. They can't pull the same "it's a license to a product" bs media services do. Though they still try with the driver. I think for now, they've just run the numbers and figured out it gives them slightly higher datacenter card sales.
> This is the first time (TMK) that a cloud provider is bringing RT cores into the market.

Your knowledge is incomplete. T4 has been available in google cloud for many months.

I stand corrected thank you!
Has linode improved their security intrusion and disclosure policy yet?

These are great improvements but are virtually worthless if linode didn't change their behavior.

What incident are you referring to? (genuine question)

As far as standards go, we use Linode and all of our customers (some of them quite demanding about internal security details) have been satisfied with the various acronyms they are accredited with... Although I understand this does not necessarily guarantee anything about response behavior, so interested to hear about past incidents.

There were some compromised accounts via a Coldfusion hack of their admin portal.

Not sure if that was isolated.

There was something more recent, too.

Anyway, happy Linode customer for quite a few years now. My stuff works, no fuss.

Any chance you can provide more information? Linode customer as well; slightly concerned.
Google ‘linode coldfusion’. I think it was over 5 years ago.
(Tory from the Linode team here)

We made some improvements to our disclosure / Bug Bounty program last year and launched this on HackerOne. The community and quality of submissions has been great. More information: https://blog.linode.com/2018/05/16/linodes-new-bug-bounty-pr...

We've also been making ongoing improvements to our application security and security infrastructure through the implementation of a DevSecOps culture. This is something we take very seriously.