Hacker News new | ask | show | jobs
by nirvdrum 85 days ago
For anyone else unfamiliar with the term:

RLHF = Reinforcement Learning from Human Feedback

https://en.wikipedia.org/wiki/Reinforcement_learning_from_hu...