That's a great question, but its hard to get enough insight into how Groq is serving models to properly know what's missing.
If I had to hazard a guess, it would be that their system architecture (# of chips and chip architecture itself) might not be designed for a high concurrency situation.
If I had to hazard a guess, it would be that their system architecture (# of chips and chip architecture itself) might not be designed for a high concurrency situation.