|
|
|
|
|
by Davidzheng
591 days ago
|
|
Not very impressed by the problems they displayed but I guess there should be some good problems in the set given the comments (not in the sense that I find them super easy but they seems random and not super well-posed, and extremely artificial problems--in the sense that they seem to not be of particular mathematical interest[or at least the mathematical content of the problem is being deliberately hidden for testing purposes] but constructed according to some weird criteria). Would be happy to hear an elaboration on the comments by the well-known mathematicians |
|
I’d say these problems strongly encourage that sort of behavior.
I’m also someone who thinks building in abilities like this to LLMs would broadly benefit the LLMs and the world, because I think this stuff generalizes. But, even if not, It would be hard to say that an LLM that could test 80% on this benchmark would be not useful to a research mathematician. Terence Tao’s dream is something like this that can hook up to LEAN, leaving research mathematicians as editors, advisors, and occasionally working on the really hard parts while the rest is automated and provably correct. There’s no doubt in my mind that a high scoring LLM for this benchmark would be helpful in that concept.