|
|
|
|
|
by HALtheWise
496 days ago
|
|
The name is very intentional, this isn't "AI's Last Evaluation", it's "Humanity's Last Exam". There will absolutely be further tests for evaluating the power of AIs, but the intent of this benchmark is that any more difficult benchmark will either be - Not an "exam" composed of single-correct-answer closed-form questions with objective answers - Not consisting of questions that humans/humanity is capable of answering. For example, a future evaluation for an LLM could consist of playing chess really well or solving the Riemann Hypothesis or curing some disease, but those aren't tasks you would ever put on an exam for a student. |
|