| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by 3d27 843 days ago
	This is great. I'm also building an LLM evaluation framework with all these benchmarks integrated in one place so anyone can go benchmark these new models on their local setup in under 10 lines of code. Hope someone finds this useful: https://github.com/confident-ai/deepeval