Hacker News new | ask | show | jobs
by ponywombat 852 days ago
This is very impressive, but whilst it was very fast with Mixtral yesterday, today I waited 59.44s for a response. If I was to use your API, the end-to-end is much more important than the Output Tokens Throughput and Time to first token metrics. Will you also publish average / minimum / maximum end-to-end times too?
1 comments

Yes, sorry about that, it's because of the huge uptick in demand we've had since we went viral. We're building out more and more hardware to cope with demand. I don't think we have any quality of service guarantees for our free tier, but you can email sales@groq.com to discuss your needs.