Hacker News new | ask | show | jobs
by moffkalast 476 days ago
Wow I'm surprised to see Mistral 24B that high up, or on this chart at all, with NeMo on the absolute bottom. Maybe they accidentally mislabeled the ratings, because I sure haven't seen the 24B hold a coherent conversation beyond half a dozen back and forth messages without it having a mental breakdown and starting to repeat itself like Howard Hughes.
1 comments

We definitely need to run much more simulations to get accurate dashboard