Hacker News new | ask | show | jobs
by itake 812 days ago
Our p99 for gpt4 is 3s. Images take up to 50s.
1 comments

so how would you go about improving that?
Not using an LLM for it.
we only send 0.5-5% of traffic to gpt4, thanks to smaller faster cheaper models. So not all of our traffic is hit with 50s latencies :-/