| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by justinl33 514 days ago
	> This is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT. This is a noteworthy achievement.

1 comments

throwaway314155 514 days ago

Excuse my ignorance. What does SFT refer to here?

josephcsible 514 days ago

Supervised fine-tuning