Hacker News new | ask | show | jobs
by maxrmk 335 days ago
Very cool. Do you do anything to mitigate ordering bias in the evaluation function, or do you just expect it to average out over time?
1 comments

No, we don't do anything. Theoretically we could judge several times with different ordering.

We could measure order bias really easily though; we just need to look at the average score by rollout position across many runs. I'll add that to my list of experiments!