| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by neelm 1096 days ago
	Something like this is going to be needed to evaluate models effectively. Evaluation should be integrated into automated pipelines/workflows that can scale across models and datasets.

1 comments

Thanks Neel! We totally agree that automated evals will become an essential part of production LLM systems.