Hacker News new | ask | show | jobs
by xlayn 97 days ago
I updated the results, with just the Devstral part, but ran the full suite for it, and posted all the results file as well as a script to re-run the process.

The results are more spectacular...

The model pointed way better in gsm8k, but lost a bit on the other categories.