| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by majke 275 days ago
	I always assumed that when one warp waits for results from a long latency instruction, another warp, potentially from another block can be scheduled in. I guess this post assumes the need to use all the gpu resources from within a single block.

1 comments

rohany 275 days ago

> I always assumed that when one warp waits for results from a long latency instruction, another warp, potentially from another block can be scheduled in.

Yes, that is correct. However, most MMA-style kernels that utilize the Tensor Core usually need enough resources per block that only 1 block fits on each SM.

link