| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by erwald 1087 days ago
	Sure it's easy -- you can use benchmarks like HumanEval, which Stability did. They just didn't compare to Codex or GPT-4. Of course such benchmarks don't capture all aspects of an LLM's capabilities, but they're a lot better than nothing!