|
|
|
|
|
by idontknowmuch
547 days ago
|
|
What's your opinion on the veracity of this benchmark - given o3 was fine-tuned and others were not? Can you give more details on how much data was used to fine-tune o3? It's hard to put this into perspective given this confounder. |
|