|
|
|
|
|
by bturtel
489 days ago
|
|
Great question! The key advantage of self-play is that we don't actually have labels for the "right" probability to assign any given question, only binary outcomes - each event either happened (1.0) or did not happen (0.0). Our thinking was that by generating multiple predictions and ranking them by proximity to the ground truth, self-play incentivizes each agent to produce more finely calibrated probabilities - or else the other agent might come just slightly closer to the actual outcome. |
|