| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by k__ 281 days ago
	Yes, often you see huge gains in some benchmark, then the model is ran through Aider's polyglot benchmark and doesn't even hit 60%.