|
|
|
|
|
by qouteall
191 days ago
|
|
Goodhart's law: When a measure becomes a target, it ceases to be a good measure. AI companies have high incentive to make score go up. They may employ human to write similar-to-benchmark training data to hack benchmark (while not directly train on test). Throwing your hard problem at work to LLM is a better metric than benchmarks. |
|