Hacker News new | ask | show | jobs
by MattSayar 56 days ago
I recognize the sarcasm. The data I can find says it's performing at baseline however?

https://marginlab.ai/trackers/claude-code/

1 comments

Yeah, that's my point. Humans are not reliable LLM evaluators. "Secret model nerfs" happen in "vibes" far more often than they do in any reality.