|
|
|
|
|
by ag8
271 days ago
|
|
It's for any task that has an "eval", which is often verifiable tasks or ones that can be judged by LLMs (e.g. see [0]). There's also been recent work such as BRPO [1] and similar approaches to make more and more "non-verifiable" tasks have verifiable rewards! [0]: https://runrl.com/blog/funniest-joke [1]: https://arxiv.org/abs/2506.00103 |
|