| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by satvikpendem 26 days ago
	RL with the harness inputs and outputs of users is one of the primary improvers of model performance, a self perpetuating flywheel.