Hacker News new | ask | show | jobs
by ACCount37 58 days ago
Yeah, that's my point. Humans are not reliable LLM evaluators. "Secret model nerfs" happen in "vibes" far more often than they do in any reality.