Y
Hacker News
new
|
ask
|
show
|
jobs
Batched reward model inference and Best-of-N sampling
(
raw.sh
)
34 points
by
rawsh
582 days ago