Batched reward model inference and Best-of-N sampling

Y	Hacker News new \| ask \| show \| jobs

	Batched reward model inference and Best-of-N sampling (raw.sh)
	34 points by rawsh 582 days ago