| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dominotw 25 days ago
	but what does it mean to be good at something that cant be verified. how do you know that they are not good at it, you are obviously using some measure. sounds like an oxymoron of a claim.

2 comments

maxbond 25 days ago

It means having taste. People say Picasso was a great painter, but that cannot be verified (at least, not in the sense of a verified reward).

link

dominotw 25 days ago

"people say picasso was a great painter" is definitely not hard to verify . lol.

link

maxbond 25 days ago

I don't know if you're being factitious or not but that was not what I meant. Picasso being a great painter is an example of "having taste"; "create an artistic image generation model with Picasso-level performance" is a valid problem statement we could attack with RLHF, but not with RLVR, because "taste" is not amenable to modeling with a reward function.

"Write this code in a way that is readable and maintainable" is another example.

link

dominotw 25 days ago

https://futurism.com/artificial-intelligence/real-monet-ai-c...

link

logifail 25 days ago

The first paragraph ends with "[...] unleashing a flood of ill-informed reactions and muddled discourse. So, you know, it was just another day online."

It's almost as though it's not about the Monet.

link

marcosdumay 25 days ago

You just threw the "easily" away from the comment you are replying.

link

dominotw 25 days ago

doesnt make a difference to my comment

link

Geezus_42 25 days ago

There is a huge difference between "not verifiable" and "not easily verifiable".

link

dominotw 25 days ago

No because if op is actually able to verify it ( with difficulty) then ai can do it too.

link

maxbond 24 days ago

No one in this thread appears to disagree. The issue is that RLHF is prohibitively expensive and the number of disciplines you could target is massive, so for reasons of economics rather than fundamental theory, AIs do not perform well on tasks that aren't amenable to RLVR and even then off the shelf LLMs are really only well aligned for programming.

In the paper I linked they created a benchmark spanning 80 disciplines with tasks that could be checked automatically. So these are necessarily tasks that are tractable for RLVR, trivially you could use performance against the benchmark as a reward function. The performance was still mediocre in everything but programming. And as we're seeing in this article, there is still room for growth in programming.

In general you seem to be reading very literally in some places (taking the statement "AIs aren't good at X" as applying to all AI and perpetually) and very loosely in others (disregarding "easily" as unimportant) and misinterpreting statements you appear to agree with as being in disagreement. I don't think there's a real disagreement here, I think there's a misunderstanding.

link

Geezus_42 20 days ago

So you are saying an LLM is just as good as a human?

link