Hacker News new | ask | show | jobs
by timabdulla 508 days ago
I think I hit all those points in my previous post, except for the fact that it's two different models, as you've noted. That said, neither of them seem to report scores for the other benchmark in each particular case.