| HN Mirror

I think if you are able to define a reward function then it sort of doesn’t matter how hard it was to do that - if you can’t then RLHF is your only option.

For example, say you’re building a chess AI that you’re going to train using reinforcement learning alphazero-style. No matter how fancy the logic that you want to employ to build the AI itself, it’s really easy to make a reward function. “Did it win the game” is the reward function.

On the other hand, if you’re making an AI to write poetry. It’s hard/impossible to come up with an objective function to judge the output so you use RLHF.

It lots of cases the whole design springs from the fact that it’s hard to make a suitable reward function (eg GANs for generation of realistic faces is the classic example). What makes an image of a face realistic? So Goodfellow came up with the idea of having two nets one which tries to generate and one which tries to discern which images are fake and which real. Now the reward functions are easy. The generator gets rewarded for generating images good enough to fool the classifier and the classifier gets rewarded for being able to spot which images are fake and which real.