Hacker News new | ask | show | jobs
by imtringued 423 days ago
>Our key finding is that all reasoning paths in the RLVR model are already present in the base model.

This is a really good observation. It means that you don't need to RL the full model. You merely need to RL a few LoRAs or maybe a small Mamba model appended to the final layer.

1 comments

Interesting, has this already been experimented?