Show HN: Reference-free evaluation of LLM-powered chatbots

Y	Hacker News new \| ask \| show \| jobs

Show HN: Reference-free evaluation of LLM-powered chatbots (github.com)

2 points by Joschkabraun 934 days ago

Hey HN!

This an interactive demo with a *somewhat* helpful AI assistant. The goal is to demonstrate a good way to reference-free evaluate interactions between humans and AI assistants. Reference-free means that you do not provide a correct answer to a query. The used metric in this context is the goal success ratio, which measures how many queries a user needs to send to reach their goal.

In the near future, there will be a guide on how to reference-free evaluate any LLM app (chat, RAG, summarization, etc.).

Try it out and please share any feedback!