|
|
|
|
|
by crackalamoo
372 days ago
|
|
Yes, 100% this. And even more so for reasoning models, which have a different kind of RL workflow based on reasoning tokens. I expect to see research labs come out with more ways to use RL with LLMs in the future, especially for coding. I feel it is quite important to dispel this idea given how widespread it is, even though it does gesture at the truth of how LLMs work in a way that's convenient for laypeople. https://www.harysdalvi.com/blog/llms-dont-predict-next-word/ |
|