Hacker News new | ask | show | jobs
by krawfy 1049 days ago
For now, we just aggregate those across the models / prompts / templates you're evaluating so that you can get an aggregate score. You can export to CSV, JSON, MongoDB, or Markdown files, and we're working on more persistence features so that you can get a history of which models / prompts / templates you gave the best scores to, and keep track of your manual evaluations over time.