| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ssivark 48 days ago
	When doing auto regressive inference, how often do you do a CUDA kernel call? What is the main bottleneck at the throughputs you're operating?