Hacker News new | ask | show | jobs
by ben_w 809 days ago
RLHF, so far as I can see.

The same positive/negative reinforcement learning from human feedback used to train them for chat/task completion rather than just autocomplete in the first place.