Hacker News new | ask | show | jobs
by curo 1065 days ago
OpenAI open sourced their evals framework. You can use it to evaluate different models but also your entire prompt chain setup. https://github.com/openai/evals

They also have a registry of evals built in.