Hacker News new | ask | show | jobs
by ml_hardware 2088 days ago
Totally agree! I think 3090 could be a lot more cost effective for researchers to dabble with NLP. But it really grinds my gears when people post these misleading benchmarks... the 3090 is handicapped at half-rate tensor core performance while the Titan RTX is not.

So if you're someone who does their work mainly in FP32, you will see improved performance with the 3090. On the other hand, if you are an FP16 speed demon who needs to train GPT-3 over the weekend, stick with your Titans :)

1 comments

What do you think about TF32 in 3090? Could it replace FP32 with 5x speedup?
I've done a lot of work in ML numerics, and I think TF32 is a completely safe drop-in for FP32 for ML workloads. NVIDIA seems to think so too, which is why on A100 it won't even be an option, it will be the default mode for any FP32 matrix multiplies.

But on 3090, I don't think the speedup will be 5x, it should be closer to like 2x. The 3090 has 35.6 TF/s at TF32 and the Titan RTX has 16.3 TF/s at FP32. Once again I think there is handicapping going on for 3090.

So basically no difference to FP32. That sounds very handicapped.