|
|
|
|
|
by hamiltont
149 days ago
|
|
Anecdotal tip on LLM-as-judge scoring - Skip the 1-10 scale, use boolean criteria instead, then weight manually e.g. - Did it cite the 30-day return policy? Y/N
- Tone professional and empathetic? Y/N
- Offered clear next steps? Y/N Then: 0.5 * accuracy + 0.3 * tone + 0.2 * next_steps Why: Reduces volatility of responses while still maintaining creativeness (temperature) needed for good intuition |
|
Failures are fed back to the LLM so it can regenerate taking that feedback into account. People are much happier with it than I could have imagined, though it's definitely not cheap (but the cost difference is very OK for the tradeoff).