|
|
|
|
|
by attentionmech
535 days ago
|
|
My first through after seeing this post was that it's a real world eval. We are running out of evals lately (arc-agi test, then sudden jump on frontier math, etc). So it's good to have such real world tests which show how far we are. |
|