| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by emp17344 26 days ago
	If it’s not easily verifiable, LLMs aren’t good at it.

2 comments

jeremyjh 26 days ago

I think that’s mostly because they get so much more of that reinforcement learning - since it is so economical. I dont know if there is any evidence of a fundamental reason they can’t be just as good at other tasks, but it might be economically infeasible for awhile yet.

mjburgess 25 days ago

No one is curating vast amounts of data for them in other domains. Programmers send programs with fixes

jeremyjh 24 days ago

Its more about how costly it is to verify work in reinforcement learning. It is cheap in Mathematics and coding because it can be automated. It is expensive in other domains because while you can capture certain datasets to do pre-training on, you ultimately need humans in the loop to judge the quality of work.

knollimar 25 days ago

There's no diff of my excel lambdas being fixed? :(

emp17344 25 days ago

RLVR doesn’t work for unverifiable tasks, so they won’t be able to effectively use tools to boost reliability for those tasks.

jeremyjh 24 days ago

Right, so you have to use RLHF. That is the economics problem I was referring to.

dominotw 25 days ago

but what does it mean to be good at something that cant be verified. how do you know that they are not good at it, you are obviously using some measure.

sounds like an oxymoron of a claim.

maxbond 25 days ago

It means having taste. People say Picasso was a great painter, but that cannot be verified (at least, not in the sense of a verified reward).

dominotw 25 days ago

"people say picasso was a great painter" is definitely not hard to verify . lol.

maxbond 25 days ago

I don't know if you're being factitious or not but that was not what I meant. Picasso being a great painter is an example of "having taste"; "create an artistic image generation model with Picasso-level performance" is a valid problem statement we could attack with RLHF, but not with RLVR, because "taste" is not amenable to modeling with a reward function.

"Write this code in a way that is readable and maintainable" is another example.

dominotw 25 days ago

https://futurism.com/artificial-intelligence/real-monet-ai-c...

logifail 25 days ago

The first paragraph ends with "[...] unleashing a flood of ill-informed reactions and muddled discourse. So, you know, it was just another day online."

It's almost as though it's not about the Monet.

marcosdumay 25 days ago

You just threw the "easily" away from the comment you are replying.

dominotw 25 days ago

doesnt make a difference to my comment

Geezus_42 25 days ago

There is a huge difference between "not verifiable" and "not easily verifiable".

dominotw 25 days ago

No because if op is actually able to verify it ( with difficulty) then ai can do it too.

maxbond 24 days ago

No one in this thread appears to disagree. The issue is that RLHF is prohibitively expensive and the number of disciplines you could target is massive, so for reasons of economics rather than fundamental theory, AIs do not perform well on tasks that aren't amenable to RLVR and even then off the shelf LLMs are really only well aligned for programming.

In the paper I linked they created a benchmark spanning 80 disciplines with tasks that could be checked automatically. So these are necessarily tasks that are tractable for RLVR, trivially you could use performance against the benchmark as a reward function. The performance was still mediocre in everything but programming. And as we're seeing in this article, there is still room for growth in programming.

In general you seem to be reading very literally in some places (taking the statement "AIs aren't good at X" as applying to all AI and perpetually) and very loosely in others (disregarding "easily" as unimportant) and misinterpreting statements you appear to agree with as being in disagreement. I don't think there's a real disagreement here, I think there's a misunderstanding.

Geezus_42 20 days ago

So you are saying an LLM is just as good as a human?