|
|
|
|
|
by simianwords
217 days ago
|
|
“ When researchers tested the same performance on a new set of benchmark questions, they noticed that models experienced “significant performance drops.”” This is very misleading because the generalisation ability of LLMs is very very high. It doesn’t just memorise problems - that’s nonsense. At high school level maths you genuinely can’t get gpt-5 thinking to make a single mistake. Not possible at all. Unless you give some convoluted ambiguous prompt that no human can understand. If you assume I’m correct, how does gpt memorise then? In fact even undergraduate level mathematics is quite simple for gpt-5 thinking. IMO gold was won.. by what? Memorising solutions? I challenge people to find ONE example that gpt-5 thinking gets wrong in high school or undergrad level maths. I could not achieve it. You must allow all tools though. |
|
If you don't think that's the case I think it's up to you to show that it's not.
___________________
[1] GSM8K leaderboard: https://llm-stats.com/benchmarks/gsm8k
[2] This is regardless of what GSM8K or any other benchmark is measuring.