| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by w-ll 810 days ago
	When has latency ever not mattered? Let alone 'chat' use cases, but holding a reponse up for N*1.2 longer than it could holds all sorts of other resources up/down stream.

1 comments

ben_w 810 days ago

When it's already faster than I can absorb the response, which for me as an organic brain includes the normal token generation rate of the free tier of ChatGPT.

If I was using them to process far more text, e.g. summarise long documents, or if I was using it as an inline editing assistant, then I'd care more about the speed.

link

qeternity 810 days ago

> When it's already faster than I can absorb the response

Streaming a response from a chatbot is only one use-case of LLMs.

I would argue the most interesting applications do not fall into this category.

link

ben_w 809 days ago

Number of different use cases (categories) I'd agree; I'm not so sure about use (volume)…

…not yet anyway. Fast moving area, lots of blue water outside the chat interface.

link

boroboro4 809 days ago

Name one use case where there is a difference between latency of 200 t/s (fireworks.ai mixtral model) and 500 t/s (groq mixtral)? Not throughput and not time to first token, but latency.

Groq model shines at latency, not at the other two.

link