| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by imtringued 423 days ago
	>Our key finding is that all reasoning paths in the RLVR model are already present in the base model. This is a really good observation. It means that you don't need to RL the full model. You merely need to RL a few LoRAs or maybe a small Mamba model appended to the final layer.

1 comments

Interesting, has this already been experimented?