| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by VivaLaPanda 849 days ago
	It's almost certainly the RLHF, not the base model.

1 comments

viscanti 849 days ago

But the base model, when its trained on the whole internet, will have some extreme biases on topics where there's a large and vocal group on one side and the other side is very silent. So RLHF is the attempt to correct for the biases on the internet.

link

leadingthenet 849 days ago

> So RLHF is the attempt to correct for the biases on the internet.

...or it can be used to reinforce a specific ideology. Completely dependent on who does the RLHF and what their motivations are.

link