| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nrhrjrjrjtntbt 214 days ago

Yes. The learning comes from running tests on the program and ensuring they pass. So running as an agent. Tests and compiler give hard feedback- thats the data outside the model that it learns from.

I think modern RLHF schemes have models that train LLMs. LLMs teaching each other isn't new.

My knowledge is limited, just based on a read of https://huyenchip.com/2023/05/02/rlhf.html though.

1 comments

suddenlybananas 214 days ago

RLHF

link