|
|
|
|
|
by LarsDu88
11 days ago
|
|
This was exactly what I was thinking of. RLVR is the secret sauce behind o3 and its many successors. Its the secret sauce behind why the current models are so great at coding and soon to be unbeatable at math. LLMs can pose many questions and if they are easily verifiable, fine tune very heavily. A lot of the world models discussion will inevitable lean into simulations as verification. |
|