|
|
|
|
|
by josh-sematic
643 days ago
|
|
They are indeed similar and OpenAI did indeed use RL at training time in a way that has not been done before, as does this approach. Yes both also involve some additional inference-time generation, but the problem is that (at least as of now) you can't get standard LLMs to actually do well with extra inference-time generation unless you have a training process that uses RL to teach them to do so effectively. I'm working on a blog post to explain more about this aimed at HN-level audiences. Stay tuned! |
|
HN discussion here: https://news.ycombinator.com/item?id=41723384