|
|
|
|
|
by D-Machine
119 days ago
|
|
It can be reasonable to be skeptical that advances on benchmarks may be only weakly or even negatively correlated with advances on real-world tasks. I.e. a huge jump on benchmarks might not be perceptible to 99% of users doing 99% of tasks, or some users might even note degradation on specific tasks. This is especially the case when there is some reason to believe most benchmarks are being gamed. Real-world use is what matters, in the end. I'd be surprised if a change this large doesn't translate to something noticeable in general, but the skepticism is not unreasonable here. |
|