Hacker News new | ask | show | jobs
by blazespin 196 days ago
Verifying math requires something like Lean which is a huge bottleneck, as the paper explains.

Plus there isn't a lot of training data in lean.

Most gains come from training on stuff already out there, not really the RLVR part which just amps it up a bit.