I was originally appalled at the software limiting. But according to Tim Dettmers who has a solid record of predicting and comparing NVIDIA cards for deep learning performance, it's not really a big deal.
Essentially from my understanding it's memory bandwidth which is the real critical path on performance in most cases. The previous generation of Turing cards had more compute than was necessary so they were an underutilized resource.
Also, this Puget benchmark is using an older version of the CUDA drivers. I believe performance is much better in CUDA 11.1.