Hacker News new | ask | show | jobs
by ForzaAaRon 329 days ago
Fascinating read. Interesting how opus performs worse compared to sonnet
1 comments

Quite interesting actually. not sure why, I assume it just overthinks. What suprised me even more is how bad o4-mini performed, after taking up hours of evaluation time and more credits than all other llms combined. More thinking != better (integration) coding performance