| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by proc0 847 days ago
	For the most part it's probably reinforcement learning from human feedback (RLHF). This incorporates humans in the training loop and its done for alignment purposes (which is overall a good idea, but it does depend on who exactly the AI is aligning with). There may also be other areas where human bias can seep in, like the massaging of the training data, but more likely the biggest factor is the direct feedback training done by a select number of people. https://aws.amazon.com/what-is/reinforcement-learning-from-h...