|
|
|
|
|
by gsandahl
310 days ago
|
|
We are running task specific benchmarks across a number of categories (agentic tasks, context tasks, normalization tasks etc), and on our benchmarks we see Gpt-5 rating slightly below o3. But at a much lower cost. See https://opper.ai/models |
|
Example: Given a long travel journal How many cities does the author mention? GPT-5: 12 Expected: 17