Y
Hacker News
new
|
ask
|
show
|
jobs
by
MattSayar
56 days ago
I recognize the sarcasm. The data I can find says it's performing at baseline however?
https://marginlab.ai/trackers/claude-code/
1 comments
ACCount37
56 days ago
Yeah, that's my point. Humans are not reliable LLM evaluators. "Secret model nerfs" happen in "vibes" far more often than they do in any reality.
link