|
|
|
|
|
by rwojo
968 days ago
|
|
This package suggests building a dataset and then using LLM-assisted evaluation via GPT-3.5/4 to evaluate your RAG pipeline on the dataset. It relies heavily on GPT-4 (or an equivalent model) to provide realistic scores. How safe is that approach? |
|