| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by imtringued 800 days ago
	There isn't. For games you would need vLLM, because batch size is more important than latency. Something that people don't seem to understand is that an NPC doesn't need to generate tokens faster than its TTS can speak. You only need to minimize the time to first token.