|
|
|
|
|
by wesleyyue
842 days ago
|
|
more early impressions on performance: besides the endpoint erroring out at a higher rate than openai, time-to-first-token is also much slower :( p50: 2.14s
p95: 3.02s And these aren't super long prompts either. vs gpt4 ttft: p50: 0.63s
p95: 1.47s |
|