|
|
|
|
|
by nrhrjrjrjtntbt
214 days ago
|
|
Yes. The learning comes from running tests on the program and ensuring they pass. So running as an agent. Tests and compiler give hard feedback- thats the data outside the model that it learns from. I think modern RLHF schemes have models that train LLMs. LLMs teaching each other isn't new. My knowledge is limited, just based on a read of https://huyenchip.com/2023/05/02/rlhf.html though. |
|