| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by recognity 110 days ago
	The insight about TTFT dominating everything resonates. We're seeing the same pattern in CLI tools — the perceived speed of AI features comes down to how fast you get the first useful output, not total processing time. Curious about your semantic end-of-turn detection: are you using a separate lightweight model for that, or is it baked into the main LLM inference? That seems like the hardest part to get right without adding latency.