| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Tostino 470 days ago
	You are missing something. This is a single stream of inference. You can load up the Nvidia card with at least 16 inference streams and get at much higher throughout tokens/sec. This just is just a single user chat experience benchmark.