Hacker News new | ask | show | jobs
by neelm 1049 days ago
Something like this is going to be needed to evaluate models effectively. Evaluation should be integrated into automated pipelines/workflows that can scale across models and datasets.
1 comments

Thanks Neel! We totally agree that automated evals will become an essential part of production LLM systems.