| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gbjw 1304 days ago
	A reward is not a constraint. In the language of modern ml, rewards 'encourage' models to produce certain constrained outputs. The actual outputs during inference can be arbitrarily poor in spite of the added 'reward' during training.