Hacker News new | ask | show | jobs
by imtringued 752 days ago
There isn't. For games you would need vLLM, because batch size is more important than latency. Something that people don't seem to understand is that an NPC doesn't need to generate tokens faster than its TTS can speak. You only need to minimize the time to first token.