Hacker News new | ask | show | jobs
by makefunstuff 118 days ago
Maaaybe, just mayyybe, in training weights there are not enough examples of producing valid assembly? And spitting something that was being fed by scraping oss repos is easy to impress for glorified autocomplete machinery?
1 comments

Huh?

I thought the latest advance in computing (spring 2025 - last year) is self-play / reinforcement learning. Like we've ran out of training data a few years ago.

https://github.com/OpenPipe/ART

Reinforcement learning having the large language model devise puzzles that they solve via llm-as-judge.

The definition of llm-as-judge is your llm generate 8-12 trajectories and a different llm judges the result. I'd use an oracle like windows or linux operating system execution for the problem of ISA-assembly creation.

The winning entries are used to train the large language model.