| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by maxbond 27 days ago

Reminds me of the recent paper about delegating document editing tasks to LLMs across different disciplines [1]. That paper found that programming was the only discipline most LLMs can perform long horizon tasks on without accumulating errors & corrupting the document.

I've only read the abstract of this one so far but it seems like this paper has zoomed in on programming with greater fidelity and shown a similar phenomenon. But not about long horizon tasks, more like "long style horizons" of larger sets of structural constraints.

[1] https://arxiv.org/abs/2604.15597

Discussion: https://news.ycombinator.com/item?id=48073246

1 comments

emp17344 27 days ago

If it’s not easily verifiable, LLMs aren’t good at it.

jeremyjh 27 days ago

I think that’s mostly because they get so much more of that reinforcement learning - since it is so economical. I dont know if there is any evidence of a fundamental reason they can’t be just as good at other tasks, but it might be economically infeasible for awhile yet.

mjburgess 27 days ago

No one is curating vast amounts of data for them in other domains. Programmers send programs with fixes

jeremyjh 25 days ago

Its more about how costly it is to verify work in reinforcement learning. It is cheap in Mathematics and coding because it can be automated. It is expensive in other domains because while you can capture certain datasets to do pre-training on, you ultimately need humans in the loop to judge the quality of work.

knollimar 26 days ago

There's no diff of my excel lambdas being fixed? :(

emp17344 26 days ago

RLVR doesn’t work for unverifiable tasks, so they won’t be able to effectively use tools to boost reliability for those tasks.

jeremyjh 25 days ago

Right, so you have to use RLHF. That is the economics problem I was referring to.

dominotw 26 days ago

but what does it mean to be good at something that cant be verified. how do you know that they are not good at it, you are obviously using some measure.

sounds like an oxymoron of a claim.

maxbond 26 days ago

It means having taste. People say Picasso was a great painter, but that cannot be verified (at least, not in the sense of a verified reward).

dominotw 26 days ago

"people say picasso was a great painter" is definitely not hard to verify . lol.

maxbond 26 days ago

I don't know if you're being factitious or not but that was not what I meant. Picasso being a great painter is an example of "having taste"; "create an artistic image generation model with Picasso-level performance" is a valid problem statement we could attack with RLHF, but not with RLVR, because "taste" is not amenable to modeling with a reward function.

"Write this code in a way that is readable and maintainable" is another example.

dominotw 26 days ago

https://futurism.com/artificial-intelligence/real-monet-ai-c...

marcosdumay 26 days ago

You just threw the "easily" away from the comment you are replying.

dominotw 26 days ago

doesnt make a difference to my comment

Geezus_42 26 days ago

There is a huge difference between "not verifiable" and "not easily verifiable".

dominotw 26 days ago

No because if op is actually able to verify it ( with difficulty) then ai can do it too.