Hacker News new | ask | show | jobs
by version_five 1118 days ago
The P should be an F, it's reinforcement learning from human feedback