Hacker News new | ask | show | jobs
by nomel 1076 days ago
If we could use GPT-4 to grade prompts, we wouldn't need to talking about grading prompts to use for GPT-4, since this solution requires that the problem doesn't exist. The question then becomes, how do you grade the prompt grading, objectively? At the bottom, there has to be a ground truth.

You can't use the thing you're testing to evaluate its own performance. This applies to rulers, speedometers, and AI. It's the difference between a "subjective" and "objective" metrics. If you want an objective metric, you need to have it based on something external, based on reality, objective. Otherwise, you have metrics and ideas that have to held themselves up.

Source: My day job is test and measurement. These concepts go back centuries. You never trust your measurement system, you verify it against a standard.