Hacker News new | ask | show | jobs
by justinl33 514 days ago
> This is the first open research to validate that reasoning capabilities of LLMs can be incentivized purely through RL, without the need for SFT.

This is a noteworthy achievement.

1 comments

Excuse my ignorance. What does SFT refer to here?
Supervised fine-tuning