Hacker News new | ask | show | jobs
by qeternity 809 days ago
> When it's already faster than I can absorb the response

Streaming a response from a chatbot is only one use-case of LLMs.

I would argue the most interesting applications do not fall into this category.

2 comments

Number of different use cases (categories) I'd agree; I'm not so sure about use (volume)…

…not yet anyway. Fast moving area, lots of blue water outside the chat interface.

Name one use case where there is a difference between latency of 200 t/s (fireworks.ai mixtral model) and 500 t/s (groq mixtral)? Not throughput and not time to first token, but latency.

Groq model shines at latency, not at the other two.