| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mdp2021 41 days ago
	I may have misunderstood: is not reasoning training (RLVR) independent from the use of the "<think>" tags - is it not a method that improves results in reasoning? How do we know that it was not carried out? Incidentally: I am trying to spend some time researching in the progresses in the area (the jump from parroting, to inconsistent apparent reasoning, to reliable reasoning).