|
|
|
|
|
by Aurornis
48 days ago
|
|
> The user won't even notice a delay until you get over 500ms I think a lot of comments are getting so laser focused on the transport delays that they’re forgetting that the LLM pipeline isn’t instant. The transport delays are additive on top of all of the other delays, which are already high. Which I assume is why they reached for the lowest latency solution they could, because they need every bit of help they can get to start shrinking that end to end delay across the entire pipeline. Analogies to human voice delay don’t work because in that case we treat the human as having no delay. |
|