Hacker News new | ask | show | jobs
by andy_xor_andrew 1118 days ago
I wonder how much longer this "Using LLMs to evaluate the quality of other LLMs" can last. Certainly it has proven valuable and useful up until now, especially since ChatGPT is a pretty high bar to evaluate against.

But it also seems like a strange, incestuous, closed system approach.

Like, unless you are introducing something new into the system, you just have the system churning against itself, probably until it reaches an equilibrium (or else becomes incoherent).

1 comments

I wonder how long "Using humans to rate the quality of other humans" thing can last. Surely academia has only so long before it collapses.
You're asserting that current LLMs are as capable as evaluating each other as are humans with advanced degrees?
Yes. They're both awful.
Also humans aren't exact clones