Hacker News new | ask | show | jobs
by emp17344 26 days ago
If it’s not easily verifiable, LLMs aren’t good at it.
2 comments

I think that’s mostly because they get so much more of that reinforcement learning - since it is so economical. I dont know if there is any evidence of a fundamental reason they can’t be just as good at other tasks, but it might be economically infeasible for awhile yet.
No one is curating vast amounts of data for them in other domains. Programmers send programs with fixes
Its more about how costly it is to verify work in reinforcement learning. It is cheap in Mathematics and coding because it can be automated. It is expensive in other domains because while you can capture certain datasets to do pre-training on, you ultimately need humans in the loop to judge the quality of work.
There's no diff of my excel lambdas being fixed? :(
RLVR doesn’t work for unverifiable tasks, so they won’t be able to effectively use tools to boost reliability for those tasks.
Right, so you have to use RLHF. That is the economics problem I was referring to.
but what does it mean to be good at something that cant be verified. how do you know that they are not good at it, you are obviously using some measure.

sounds like an oxymoron of a claim.

It means having taste. People say Picasso was a great painter, but that cannot be verified (at least, not in the sense of a verified reward).
"people say picasso was a great painter" is definitely not hard to verify . lol.
I don't know if you're being factitious or not but that was not what I meant. Picasso being a great painter is an example of "having taste"; "create an artistic image generation model with Picasso-level performance" is a valid problem statement we could attack with RLHF, but not with RLVR, because "taste" is not amenable to modeling with a reward function.

"Write this code in a way that is readable and maintainable" is another example.

The first paragraph ends with "[...] unleashing a flood of ill-informed reactions and muddled discourse. So, you know, it was just another day online."

It's almost as though it's not about the Monet.

You just threw the "easily" away from the comment you are replying.
doesnt make a difference to my comment
There is a huge difference between "not verifiable" and "not easily verifiable".
No because if op is actually able to verify it ( with difficulty) then ai can do it too.
No one in this thread appears to disagree. The issue is that RLHF is prohibitively expensive and the number of disciplines you could target is massive, so for reasons of economics rather than fundamental theory, AIs do not perform well on tasks that aren't amenable to RLVR and even then off the shelf LLMs are really only well aligned for programming.

In the paper I linked they created a benchmark spanning 80 disciplines with tasks that could be checked automatically. So these are necessarily tasks that are tractable for RLVR, trivially you could use performance against the benchmark as a reward function. The performance was still mediocre in everything but programming. And as we're seeing in this article, there is still room for growth in programming.

In general you seem to be reading very literally in some places (taking the statement "AIs aren't good at X" as applying to all AI and perpetually) and very loosely in others (disregarding "easily" as unimportant) and misinterpreting statements you appear to agree with as being in disagreement. I don't think there's a real disagreement here, I think there's a misunderstanding.

So you are saying an LLM is just as good as a human?