Hacker News new | ask | show | jobs
by lyu07282 947 days ago
No it's reinforcement learning with human feedback, RLHF lots of labeling