|
|
|
|
|
by snemvalts
15 days ago
|
|
Most benchmarks can be trained for as well, so they are over-representative of model's engineering skills.
The entire nature of a benchmark is collapsing some qualitative work (software engineering task, architecture choice, code quality) into a quantitative score which can be optimized for. |
|