|
|
|
|
|
by sailingparrot
1613 days ago
|
|
> Systems like this are designed to reach nearly peak performance The system certainly is. The code running on that system generally isn't. Pulling 100% of the FLOPS the GPUs are able to provide is quite hard. And my point was it also depends on the specific models you are training. Are you training a transformer model in FP32 precision? Then yes, 6K A100 will blow 10K V100. Are you training a ConvNet in FP16? Then no, 10K V100 will perform better. The GPUs have different architecture, you have to use the architecture best suited for the A100 to achieve the speedup marketed by NVidia, which is presumably the number FB is using to claim that their 6k GPU cluster is bigger than OpenAI's 10K one. |
|