|
|
|
|
|
by squimmy26
177 days ago
|
|
How certain can we be that these improvements aren't just a result of Gemini 3 Pro pre-training on endless internet writeups of where 2.5 has struggled (and almost certainly what a human would have done instead)? In other words, how much of this improvement is true generalization vs memorization? |
|