Hacker News new | ask | show | jobs
by retrovrv 975 days ago
Thanks for sharing my blog here!

Quick notes on the analysis: - This is based on data from 100+ organizations globally, doing million+ requests a day via Portkey.ai. - I've randomly sampled 10,000 requests for both GPT 3.5 & 4 each day, for the past 3 months. - Variants like -0613 & -0314 are grouped under GPT3.5 & GPT4 for clarity. - I've plotted 'Latency PER Token' rather than just latency - For each day, I've counted percentile values for Latency/Token across different percentiles like 50, 75, 90, 99 to account for variance in prompt length and complexities and separate out anomalies

Happy to share more or answer questions, if any.