Hacker News new | ask | show | jobs
by danielcampos93 479 days ago
GPT-4o as a judge to evaluate the quality of something which gpt4o is not inherently that good at. Red flag.