|
|
|
|
|
by harryp_peng
787 days ago
|
|
You know at one point we wouldn't be able to benchmark them, due to the sheer complexity of the test required. I.e. if you are testing a model on maths, the problem will have to be extremely difficult to even consider a 'hustle' for the LLM; it would then take you a day to work out the solution yourself. See where it's getting at? When humans are no longer on the same spectrum as LLMs, that's probably the definition of AGI. |
|