|
|
|
|
|
by mentalgear
471 days ago
|
|
Given a benchmark corpus, the evaluation criteria could be: - Facts extracted: the amount of relevant facts extracted from the corpus - Interpretations : based on the facts, % of correct interpretations made - Correct Predictions: based on the above, % of correct extrapolations / interpolations / predictions made The ground truth could be in JSON file per example.
(If the solution you want to benchmark uses a graph db, you could compare these aspects with a LLM as judge.) --- The actual writing is more about formal/business/academic style, and I find less relevant for a benchmark. However I would find it crucial to run a "reverse RAG" over the generated report to ensure each claim has a source. [0] [0] https://venturebeat.com/ai/mayo-clinic-secret-weapon-against... |
|