Hacker News new | ask | show | jobs
by segmondy 523 days ago
Yup, I did an experiment a long time ago, where I wanted best of 2. I had Wizard, Mistral & Llama. They would generate responses and I would pass the response to all 3 models to vote. I would pass it in to a new prompt without reference to previous prompt, 95%+ of the time, they all voted for their own response even when it was clear there was a better response. LLM as a judge is a joke.