|
|
|
|
|
by ultrasaurus
814 days ago
|
|
I wonder how well ELO score handle the edge case where your most important games are against yourself. There are 4 GPT4's in the top 10 (both #2 and #3) and 3 Claude's. (To their credit, they count anything where the 95% confidence intervals overlap as a tie) |
|