| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by storystarling 153 days ago
	It sounds like those workloads are memory bandwidth bound. In my experience with generative models, the compute units end up waiting on VRAM throughput, so throwing more wattage at the cores hits diminishing returns very quickly.

2 comments

zozbot234 153 days ago

If they were memory bandwidth bound wouldn't that in itself push the wattage and thermals down comparatively, even on a "pegged to 100%" workload? That's the very clear pattern on CPU at least.

link

touisteur 152 days ago

That's my experience as well, after monitoring frequency and temp on lots of kernel on all the spectrum from memory-bound, to L2-bound to compute-bound. Hard to reach the 600W with memory-bound kernel. TensorRT manages it somehow with some small to mid networks but perf increase seems capped around 10% too even with all the magic inside.

link

touisteur 152 days ago

I thought so but no, iterative small matrix multiplication kernel in tensor cores, or pure (generative) compute with ultra-late reduction and ultra-small working memory. nsight-compute says everything is in L1 or small register file, no spilling, and that I am compute bound, good ILP. Can't find a way to get more than 10% for the 300W difference. Thus asking if anyone did better and how and how reliable the HW stays.

link