Hacker News new | ask | show | jobs
by senko 563 days ago
Too late to edit, but here's a great, really in-depth post about using LLMs as judges to evaluate LLM outputs (when you don't have the ground truth for everything): https://cameronrwolfe.substack.com/p/finetuned-judge This is about finetuning LLMs to do it, but the first part is a good intro to why and how.