Hacker News new | ask | show | jobs
by GabrielBianconi 14 days ago
Any function that can score (i.e. "evaluate") your LLM system (e.g. your agent).

For example:

- You write a heuristic (regex, code, etc.) that assigns a score to an output

- You make another LLM score the output from your system (aka "LLM-as-a-judge")

- You have an automated system that can verify the generated outputs (e.g. does generated code compile or pass tests?)

People often talk about "LLM evals (evaluations)" which will include a set of evaluators i.e. scoring functions.

We'll make this clearer next time!