Y
Hacker News
new
|
ask
|
show
|
jobs
by
justinl33
514 days ago
> This is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT.
This is a noteworthy achievement.
1 comments
throwaway314155
514 days ago
Excuse my ignorance. What does SFT refer to here?
link
josephcsible
514 days ago
Supervised fine-tuning
link