| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by saagarjha 342 days ago
	There's a 2x performance hit from the weird restriction on fp32 accumulation, plus the fact that 5090 has "fake" Blackwell (no tcgen05) which limits the size and throughput of matrix multiplication through the tensor cores.