|
|
|
|
|
by recognity
110 days ago
|
|
The insight about TTFT dominating everything resonates. We're seeing the same pattern in CLI tools — the perceived speed of AI features comes down to how fast you get the first useful output, not total processing time. Curious about your semantic end-of-turn detection: are you using a separate lightweight model for that, or is it baked into the main LLM inference? That seems like the hardest part to get right without adding latency. |
|