|
|
|
|
|
by iFire
110 days ago
|
|
Huh? I thought the latest advance in computing (spring 2025 - last year) is self-play / reinforcement learning. Like we've ran out of training data a few years ago. https://github.com/OpenPipe/ART Reinforcement learning having the large language model devise puzzles that they solve via llm-as-judge. The definition of llm-as-judge is your llm generate 8-12 trajectories and a different llm judges the result. I'd use an oracle like windows or linux operating system execution for the problem of ISA-assembly creation. The winning entries are used to train the large language model. |
|