Generalized on-policy distillation with reward extrapolation

Y	Hacker News new \| ask \| show \| jobs

	Generalized on-policy distillation with reward extrapolation (arxiv.org)
	3 points by fzliu 125 days ago