|
|
|
|
|
by tbenst
2111 days ago
|
|
Thank you for clarifying! I’m still skeptical of the chart’s A100 values but appreciated your reasonable attempt to de-bias. It’s always easier to critique then create so I also want to make sure I complement you on an excellent article :). |
|
One thing that I am quite sure of for the A100 is its transformer performance. It turns out, large transformers are so strongly bottlenecked by memory bandwidth that you can just use memory bandwidth alone to measure performance — even across GPU architectures. The error between Volta and Turning with a pure bandwidth model is less than 5%. The NVIDIA transformer A100 benchmark data shows similar scaling. So I am pretty confident on the transformer numbers.
The computer vision numbers are more dependent on the network and it is difficult to generalize across all CNNs. For example, group convolution or depth-wise separable convolution based CNNs do not scale well with better GPUs and speedups will be small (1.2 - 1.5x) whereas some other networks like ResNet get pretty straightforward improvements (1.6x-1.7x). So CNN values are less straightforward because there is more diversity between CNNs compared to transformers.