|
|
|
|
|
by verve_rat
54 days ago
|
|
My theory is we will end up in a similar spot to hiring people. You can look at a CV (benchmarks) but you won't know for sure until you've worked with them for six months. We as an industry cannot determine if one software engineer is objectively better than another, on practically any dimension, so why do we think we can come to an objective ranking of models? |
|
But I'm more optimistic about testing programming models. You can run repeated tests, and compare median performance. You can run long tests, like hundreds of hours, while getting more than a few humans to complete half-day tests is a huge project. And you can do ablation testing, where you remove some feature of the environment or tools and see how much it helps/hurts.