| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by verve_rat 54 days ago
	My theory is we will end up in a similar spot to hiring people. You can look at a CV (benchmarks) but you won't know for sure until you've worked with them for six months. We as an industry cannot determine if one software engineer is objectively better than another, on practically any dimension, so why do we think we can come to an objective ranking of models?

5 comments

tlb 54 days ago

Yes, the entire field of software engineering ran aground on not being able to test how well people can write software.

But I'm more optimistic about testing programming models. You can run repeated tests, and compare median performance. You can run long tests, like hundreds of hours, while getting more than a few humans to complete half-day tests is a huge project. And you can do ablation testing, where you remove some feature of the environment or tools and see how much it helps/hurts.

link

zelphirkalt 54 days ago

Not many things are as manifold broken as hiring these days. I hope we do not end up there.

link

roymain 54 days ago

The CV-to-six-months analogy is actually exactly right and it's also why benchmarks for hiring people stopped being useful. The signal that holds up is what you see when something breaks, which is hard to compress into a number.

link

bartekpacia 54 days ago

this smells like an ai-generated comment so much

link

pishpash 54 days ago

You do not interview 1000 rounds on problems you're actually solving. If you did, hiring would be fine. Minus the social fit aspect, which isn't as relevant for a model.

link

PunchyHamster 54 days ago

Terrible comparison. CV is just a list, telling you barely anything about performance and that's when candidate is not lying to get thru HR filter.

And we can judge developer performance, it just takes 6 months to a year working with a team so it's just hard to get metric

link