Hacker News new | ask | show | jobs
by yorwba 128 days ago
In this context, "performance" means "does it do what we want it to do" not "does it do it quickly". Quality of output is what they're measuring, speed is not a consideration.
1 comments

The point is that whether it does what you tell it in a single iteration is less important then whether it avoids stupid mistakes. Any serious use will put it in a harness.
My point is that you misread the comment you replied to. (By the way, on page 2 of the paper: "we evaluate each LLM only within its corresponding harness.")
> My point is that you misread the comment you replied to.

I'm not the person you replied to.

> (By the way, on page 2 of the paper: "we evaluate each LLM only within its corresponding harness.")

That has zero relevance to my comment or to the type of harnesses I talked about in the comment you replied to, nor in my comment up-thread.

The only people I have replied to in this thread were vidarh, vidarh, and now vidarh again. I thought you were all the same person?