|
|
|
|
|
by marsh_mellow
426 days ago
|
|
Good point. They said they validated the results by testing with other models (including Claude), as well as with manual sanity checks. 55% to 45% definitely isn't a blowout but it is meaningful — in terms of ELO it equates to about a 36 point difference. So not in a different league but definitely a clear edge |
|