| HN Mirror

> Does anyone have any insight into why reinforcement learning is (maybe) required/historically favoured?

From a concept stage, it has attractive similarities to the way people learn in real life (rewarded for successful learnings, punished for failure), and although we know similarities to nature don’t guarantee better results than alternatives (for example, our modern airplane does not “flap” its wings the way a bird does), natural solutions will be continually looked to as a starting point and tool to try on new problems.

Additionally, RL gives you a good start on unclear-how-to-address problems. In spaces where it’s not clear where to begin optimizing besides taking actions and seeing how they do judged against some metric, reinforcement learning often provides a good mental and code framework for attacking these problems.

>There was a paper recently suggesting that you can use a preference learning objective directly

Doing a very quick skim, it looks like that paper is arguing rather than giving rewards or punishments based on preferences, you can just build a predictive classifier for the kinds of responses humans prefer. It seems interesting, though I wonder the extent to which you still have to occasionally do that reinforcement learning to generate relevant data for evaluating the classifier.