|
|
|
|
|
by gardnr
353 days ago
|
|
It's small (3B) and does great on benchmarks. This is a model for edge / mobile deployments so the gains over gemma3-4b are meaningful. It has dual mode reasoning / non_reasoning AND they released the full training method: > We're releasing SmolLM3 with our engineering blueprint. It includes architecture details, exact data mixtures showing how we progressively boost performance across domains in a three-stage pretraining approach, and the methodology for building a hybrid reasoning model. Usually, achieving these results would require months of reverse engineering. Instead, we're providing the full methodology. |
|
So this is primarily great for enterprises who want to do on-prem with limited budgets and maybe high-end enthusiasts.