|
|
|
|
|
by blazespin
196 days ago
|
|
Verifying math requires something like Lean which is a huge bottleneck, as the paper explains. Plus there isn't a lot of training data in lean. Most gains come from training on stuff already out there, not really the RLVR part which just amps it up a bit. |
|