|
|
|
|
|
by gsandahl
310 days ago
|
|
Most of the tasks have assessed with ground truth, occasionally helped with an LLM as a judge to assess the answer if the answer is a sentence and not an exact result. Example:
Given a long travel journal
How many cities does the author mention?
GPT-5: 12
Expected: 17 |
|