Hacker News new | ask | show | jobs
Batched reward model inference and Best-of-N sampling (raw.sh)
34 points by rawsh 582 days ago