| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by shahules 1092 days ago
	Evals is not suitable for evaluating LLM applications such as RAG, etc because one has to evaluate on their own data where no golden test data exists, and techniqus used have poor correlation with human judgement. We have build RAGAS framework for this https://github.com/explodinggradients/ragas

1 comments

resiros 1092 days ago

Great project! We're building an open-source platform for building robust LLM apps (https://github.com/Agenta-AI/agenta), we'd love to integrate your library into our evaluation!

link

shahules 1092 days ago

Thank you. That's great. We are working on forming paradigms to evaluate Agents, I'll get in touch with you.

link