Hacker News new | ask | show | jobs
Ash HN: How do you do Evals on your Conversational ChatBots
2 points by deepakthakur 691 days ago
There are a lot of Eval frameworks, which work on the premise of <Question, Answer, Context> and give a score to LLM response. They work pretty well with cases where we expect a response to a query, and a context(ground truth) is provided in form of RAG.

How can I use this paradigm for chatBot evaluation? Reason being that conversational bots also have chatHistory apart from the last Question in the chat, which doesn't seem to fit well in <Question, Answer, Context> format. Or are there other ways to do evals ? How has been your experience testing/evaluating ?