| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mips_avatar 181 days ago
	I don't think the models are doing this, time to first token is more of a hardware thing. But people writing agents are definitely doing this, particularly in voice it's worth it to use a smaller local llm to handle the acknowledgment before handing it off.