Hacker News new | ask | show | jobs
by maxbond 27 days ago
Reminds me of the recent paper about delegating document editing tasks to LLMs across different disciplines [1]. That paper found that programming was the only discipline most LLMs can perform long horizon tasks on without accumulating errors & corrupting the document.

I've only read the abstract of this one so far but it seems like this paper has zoomed in on programming with greater fidelity and shown a similar phenomenon. But not about long horizon tasks, more like "long style horizons" of larger sets of structural constraints.

[1] https://arxiv.org/abs/2604.15597

Discussion: https://news.ycombinator.com/item?id=48073246

1 comments

If it’s not easily verifiable, LLMs aren’t good at it.
I think that’s mostly because they get so much more of that reinforcement learning - since it is so economical. I dont know if there is any evidence of a fundamental reason they can’t be just as good at other tasks, but it might be economically infeasible for awhile yet.
No one is curating vast amounts of data for them in other domains. Programmers send programs with fixes
Its more about how costly it is to verify work in reinforcement learning. It is cheap in Mathematics and coding because it can be automated. It is expensive in other domains because while you can capture certain datasets to do pre-training on, you ultimately need humans in the loop to judge the quality of work.
There's no diff of my excel lambdas being fixed? :(
RLVR doesn’t work for unverifiable tasks, so they won’t be able to effectively use tools to boost reliability for those tasks.
Right, so you have to use RLHF. That is the economics problem I was referring to.
but what does it mean to be good at something that cant be verified. how do you know that they are not good at it, you are obviously using some measure.

sounds like an oxymoron of a claim.

It means having taste. People say Picasso was a great painter, but that cannot be verified (at least, not in the sense of a verified reward).
"people say picasso was a great painter" is definitely not hard to verify . lol.
I don't know if you're being factitious or not but that was not what I meant. Picasso being a great painter is an example of "having taste"; "create an artistic image generation model with Picasso-level performance" is a valid problem statement we could attack with RLHF, but not with RLVR, because "taste" is not amenable to modeling with a reward function.

"Write this code in a way that is readable and maintainable" is another example.

You just threw the "easily" away from the comment you are replying.
doesnt make a difference to my comment
There is a huge difference between "not verifiable" and "not easily verifiable".
No because if op is actually able to verify it ( with difficulty) then ai can do it too.