https://twitter.com/natfriedman/status/1625850766039842824?c...
RLHF = Reinforcement Learning from Human Feedback