|
|
|
|
|
by natolambert
509 days ago
|
|
As the other commenter said, R1 required very standard RLHF techniques too.
But a fun way to think about it is that reasoning models are going to be bigger and uplift the RLHF boat. But we need a few years to establish basics before I can write a cumulative RL for LLMs book ;) |
|