Hacker News new | ask | show | jobs
by ltbarcly3 182 days ago
They both make stuff up and make very obvious mis-interpretations of evidence. If you take the output of an LLM, and ask another LLM to check it, this dramatically reduces this. Even if you do it with the same LLM but without the existing context. I was able to write a detailed analysis of a rule system by doing this with 3 steps, claude -> chatgpt -> gemini3. It caught all the mistakes, including overstatements and vague statements. It wasn't perfect, but even after one review the # of mistakes or stupid statements was almost 0.