| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ninjha 78 days ago
	> how many work units can run in parallel not original author but batching is one very important trick to make inference efficient, you can reasonably do tens to low hundreds in parallel (depending on model size and gpu size) with very little performance overhead