|
|
|
|
|
by noch
499 days ago
|
|
> So RLHF is the secret sauce behind modern LLMs? Karpathy wrote[^0]: " RL is powerful. RLHF is not. […] And yet, RLHF is a net helpful step of building an LLM Assistant. I think there's a few subtle reasons but my favorite one to point to is that through it, the LLM Assistant benefits from the generator-discriminator gap. That is, for many problem types, it is a significantly easier task for a human labeler to select the best of few candidate answers, instead of writing the ideal answer from scratch. […] No production-grade actual RL on an LLM has so far been convincingly achieved and demonstrated in an open domain, at scale. " --- [^0]: https://x.com/karpathy/status/1821277264996352246 |
|