| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by chr15m 137 days ago
	If you view LLM driven dev as a kind of evolutionary process rather than an engineering process (at the level of a single LLM output) then this makes a lot of sense. You're widening the population from which you select for fitness.

1 comments

languid-photic 137 days ago

This was exactly the kernel of the idea :)

link

chr15m 137 days ago

Ah interesting. Thank you very much for sharing the illuminating results.

One question I had - was the judgement blinded? Did judges know which models produced which output?

link

languid-photic 137 days ago

It was not, the agent id is not overt but can be found via the workspace filepath.

But that is a good point. Perhaps it should be mapped to something unidentifiable.

link

chr15m 137 days ago

Ah ok. If you do run it again that would be a worthwhile change. I know I personally have biases about models and I have seen others commenting the same - it seems likely it would skew the results at least a little.

Nonetheless you've convinced me to try an even wider variety of models, thanks!

In fact, this makes me think I should add this as a feature to my AI dev tooling - compare responses side by side and pick the best one.

link