| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by chr15m 137 days ago
	Ah interesting. Thank you very much for sharing the illuminating results. One question I had - was the judgement blinded? Did judges know which models produced which output?

1 comments

languid-photic 137 days ago

It was not, the agent id is not overt but can be found via the workspace filepath.

But that is a good point. Perhaps it should be mapped to something unidentifiable.

link

chr15m 137 days ago

Ah ok. If you do run it again that would be a worthwhile change. I know I personally have biases about models and I have seen others commenting the same - it seems likely it would skew the results at least a little.

Nonetheless you've convinced me to try an even wider variety of models, thanks!

In fact, this makes me think I should add this as a feature to my AI dev tooling - compare responses side by side and pick the best one.

link