| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by blazespin 243 days ago

Verifying math requires something like Lean which is a huge bottleneck, as the paper explains.

Plus there isn't a lot of training data in lean.

Most gains come from training on stuff already out there, not really the RLVR part which just amps it up a bit.