Hacker News new | ask | show | jobs
by k__ 281 days ago
Yes, often you see huge gains in some benchmark, then the model is ran through Aider's polyglot benchmark and doesn't even hit 60%.