Hacker News new | ask | show | jobs
by fatso784 1014 days ago
ChainForge lets you do this, and also setup ad-hoc evaluations with code, LLM scorers, etc. It also shows model responses side-by-side for the same prompt: https://github.com/ianarawjo/ChainForge