Hacker News new | ask | show | jobs
Show HN: Reference-free evaluation of LLM-powered chatbots (github.com)
2 points by Joschkabraun 934 days ago
Hey HN!

This an interactive demo with a *somewhat* helpful AI assistant. The goal is to demonstrate a good way to reference-free evaluate interactions between humans and AI assistants. Reference-free means that you do not provide a correct answer to a query. The used metric in this context is the goal success ratio, which measures how many queries a user needs to send to reach their goal.

In the near future, there will be a guide on how to reference-free evaluate any LLM app (chat, RAG, summarization, etc.).

Try it out and please share any feedback!