When it's already faster than I can absorb the response, which for me as an organic brain includes the normal token generation rate of the free tier of ChatGPT.
If I was using them to process far more text, e.g. summarise long documents, or if I was using it as an inline editing assistant, then I'd care more about the speed.
Name one use case where there is a difference between latency of 200 t/s (fireworks.ai mixtral model) and 500 t/s (groq mixtral)? Not throughput and not time to first token, but latency.
Groq model shines at latency, not at the other two.
If I was using them to process far more text, e.g. summarise long documents, or if I was using it as an inline editing assistant, then I'd care more about the speed.