|
|
|
|
|
by vlovich123
188 days ago
|
|
One classic problem in all ML is ensuring the benchmark is representative and that the algorithm isn’t overfitting the benchmark. This remains an open problem for LLMs - we don’t have true AGI benchmarks and the LLMs are frequently learning the benchmark problems without actually necessarily getting that much better in real world. Gemini 3 has been hailed precisely because it’s delivered huge gains across the board that aren’t overfitting to benchmarks. |
|