|
|
|
|
|
by kcorbitt
338 days ago
|
|
No, we don't do anything. Theoretically we could judge several times with different ordering. We could measure order bias really easily though; we just need to look at the average score by rollout position across many runs. I'll add that to my list of experiments! |
|