|
|
|
|
|
by JacobJeppesen
934 days ago
|
|
Seems like they have made progress in combining reinforcement learning and LLMs. Andrej Karpathy mentions it in his new talk (~38 minutes in) [1], and Ilya Sutskever talks about it in a lecture at MIT (~29 minutes in) [2]. It would be a huge breakthrough to find a proper reward function to train LLMs in a reinforcement learning setup, and to train a model to solve math problems in a similar fashion to how AlphaGo used self-play to learn Go. [1] https://www.youtube.com/watch?v=zjkBMFhNj_g&t=2282s [2] https://www.youtube.com/watch?v=9EN_HoEk3KY&t=1705s |
|