| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zhihaojia 407 days ago
	You are right that CUDA graph can help reduce launch overhead but does not support overlapping computation/communication across layers, since data dependencies are described at the kernel level.