Hacker News new | ask | show | jobs
by maxbond 25 days ago
It means having taste. People say Picasso was a great painter, but that cannot be verified (at least, not in the sense of a verified reward).
1 comments

"people say picasso was a great painter" is definitely not hard to verify . lol.
I don't know if you're being factitious or not but that was not what I meant. Picasso being a great painter is an example of "having taste"; "create an artistic image generation model with Picasso-level performance" is a valid problem statement we could attack with RLHF, but not with RLVR, because "taste" is not amenable to modeling with a reward function.

"Write this code in a way that is readable and maintainable" is another example.

The first paragraph ends with "[...] unleashing a flood of ill-informed reactions and muddled discourse. So, you know, it was just another day online."

It's almost as though it's not about the Monet.