Hacker News new | ask | show | jobs
by edelans 1124 days ago
RLHF = Reinforcement learning from human feedback

https://en.wikipedia.org/wiki/Reinforcement_learning_from_hu...