| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Aurornis 48 days ago

> The user won't even notice a delay until you get over 500ms

I think a lot of comments are getting so laser focused on the transport delays that they’re forgetting that the LLM pipeline isn’t instant.

The transport delays are additive on top of all of the other delays, which are already high.

Which I assume is why they reached for the lowest latency solution they could, because they need every bit of help they can get to start shrinking that end to end delay across the entire pipeline.

Analogies to human voice delay don’t work because in that case we treat the human as having no delay.

1 comments

jedberg 48 days ago

And that was the entire point of my comment. That your transport layer isn't your bottleneck. You can start processing before they finish speaking. Your bottleneck will always be what happens after that.

link