Hacker News new | ask | show | jobs
by afro88 1100 days ago
> but it still might overlook important subtleties

If there's one thing we can be certain of, it's that LLMs often overlooks important subtleties.

Can't believe they used GPT4 to also evaluate the results. I mean, we wouldn't trust a student to grade their own exam even when given the right answers to grade with.