|
|
|
|
|
by andai
792 days ago
|
|
Sibling comment made a good point about benchmarks not being a great indiactor of real world quality. Every time something scores near GPT-4 on benchmarks, I try it out and it ends up being less reliable than GPT-3 within a few minutes of usage. |
|
It may even be the case that in measuring against the benchmarks, these product teams sacrifice some real world performance (just as a student that only studies for the SAT might sacrifice some real world skills).