|
|
|
|
|
by Bjorkbat
347 days ago
|
|
I recall it as less an evolution and more a complete tonal shift the moment o3 was evaluated on ARC-AGI. I remember on Twitter Sam made some dumb post suggesting they had beaten the benchmark internally and Francois calling him out on his vagueposting. Soon as they publicly released the scores, it was like he was all-in on reasoning. Which I have to admit I was kind of disappointed by. |
|