|
|
|
|
|
by wesleyyue
638 days ago
|
|
Interesting observations: * Llama 3.2 multimodal actually still ranks below Molmo from ai2 released this morning. * AI2D: 92.3 (3.2 90B) vs 96.3 (of Molmo 72B) * Llama 3.2 1B and 3B is pruned from 3.1 8B so no leapfrogging unlike 3 -> 3.1. * Notably no code benchmarks. Deliberate exclusion of code data in distillation to maximize mobile on-device use cases? Was hoping there would be some interesting models I can add to https://double.bot but doesn't seem like any improvements to frontier performance on coding. |
|
(Edit: parent comment was corrected, thanks!)