yep this is what I meant. hallucinating, justifying or missing bad stuff.
additionally, similar to how large PRs are more likely to just be skimmed and replied with a "LGTM!", an LLM missing some bad stuff but still producing a seemingly thorough review would increase the chance of the bad stuff making its way in.
allowing LLMs to write code would be fine if its truly verified by a human, but let another LLM hallucinate and cloud a persons judgement and you've got a problem
additionally, similar to how large PRs are more likely to just be skimmed and replied with a "LGTM!", an LLM missing some bad stuff but still producing a seemingly thorough review would increase the chance of the bad stuff making its way in.
allowing LLMs to write code would be fine if its truly verified by a human, but let another LLM hallucinate and cloud a persons judgement and you've got a problem