| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ben_w 809 days ago
	RLHF, so far as I can see. The same positive/negative reinforcement learning from human feedback used to train them for chat/task completion rather than just autocomplete in the first place.