Hacker News new | ask | show | jobs
by wesleyyue 842 days ago
more early impressions on performance: besides the endpoint erroring out at a higher rate than openai, time-to-first-token is also much slower :(

p50: 2.14s p95: 3.02s

And these aren't super long prompts either. vs gpt4 ttft:

p50: 0.63s p95: 1.47s