Hacker News new | ask | show | jobs
by chr15m 137 days ago
If you view LLM driven dev as a kind of evolutionary process rather than an engineering process (at the level of a single LLM output) then this makes a lot of sense. You're widening the population from which you select for fitness.
1 comments

This was exactly the kernel of the idea :)
Ah interesting. Thank you very much for sharing the illuminating results.

One question I had - was the judgement blinded? Did judges know which models produced which output?

It was not, the agent id is not overt but can be found via the workspace filepath.

But that is a good point. Perhaps it should be mapped to something unidentifiable.

Ah ok. If you do run it again that would be a worthwhile change. I know I personally have biases about models and I have seen others commenting the same - it seems likely it would skew the results at least a little.

Nonetheless you've convinced me to try an even wider variety of models, thanks!

In fact, this makes me think I should add this as a feature to my AI dev tooling - compare responses side by side and pick the best one.