Hacker News new | ask | show | jobs
by atn34 217 days ago
I actually started a collection of annoying bugs I’ve seen in the wild. I give the llm the buggy implementation and ask it to write a test that catches it. So far not even a frontier model (Claude Sonnet) can do it, even though they can find and fix the bug itself.
1 comments

> even a frontier model (Claude Sonnet) can do it

Probably because Sonnet is no longer a frontier model, it isn't even the best model Anthropic offers, according to themselves.