|
|
|
|
|
by int_19h
491 days ago
|
|
Every time I see these kinds of prompts that ask an LLM for a numeric ranking, I'm very skeptical that the numbers really mean anything to the model. How does it know what a 0.5 is supposed to be? With humans, you'd have them grade things and then correct the grades so they learn what it is from experience. But unless you specifically fine tune your LLM, this wouldn't apply. |
|
With gemini-2 I've been able to get similar results without the few-shot prompts. Simply by prompting it to not be a sycophant, and explaining why it was important to get realistic, even hard scores, and that i expected most scores to be low, on order for the high scoring content to stand out.
In a recent test, I changed to using word scores, low, medium, high, and very high. Out of about 500 examples none scored very high. I thought that was pretty cool, as when I do find one scoring high it will stand out, and hopefully justify it's score