Y
Hacker News
new
|
ask
|
show
|
jobs
by
k__
281 days ago
Yes, often you see huge gains in some benchmark, then the model is ran through Aider's polyglot benchmark and doesn't even hit 60%.