Hacker News new | ask | show | jobs
by LarsDu88 35 days ago
It's extremely verifiable. The reinforcement finetuning strategy I'm referring to involves LLM creating coding tasks with an expected output, implementing the code, and then having a compiler (or interpreter in the case of languages like python) succeed or fail to run the code. Then compare the output to expected output. The verification process (run interpreter + run test) can be done in seconds. One can generate millions of datasets like this for free and there is extensive research showing with the right policy, an agent will be able to learn to reason - first as good as human, and in many cases superior to a human.
2 comments

For basic primitives with known output it’s verifiable, but as long as you’re dealing with real systems with tons of inputs and side effects this no longer holds true.

> research showing with the right policy, Rest of the owl.

> It's extremely verifiable.

Only if you fully detail the behavior of the system.... at that point why use a chatbot? You've coded the entire thing.

> first as good as human

We'll see. Chatbots are only as capable as you detail them to be