|
|
|
|
|
by N_Lens
200 days ago
|
|
Also the impressive IMO-ProofBench Basic benchmark, the model achieved nearly 99% accuracy, though it fell slightly behind Gemini Deep Think on the Advanced subset. The approach shifts from "result-oriented" to "process-oriented" verification, particularly important for theorem proving where rigorous step-by-step derivation matters more than just numerical answers. |
|
[1] https://arxiv.org/abs/2406.06592
[2] https://arxiv.org/abs/2505.15034