|
|
|
|
|
by BryanLegend
930 days ago
|
|
Andrej Karpathy's take: New open weights LLM from
@MistralAI params.json:
- hidden_dim / dim = 14336/4096 => 3.5X MLP expand
- n_heads / n_kv_heads = 32/8 => 4X multiquery
- "moe" => mixture of experts 8X top 2 Likely related code:
https://github.com/mistralai/megablocks-public Oddly absent: an over-rehearsed professional release video talking about a revolution in AI. If people are wondering why there is so much AI activity right around now, it's because the biggest deep learning conference (NeurIPS) is next week. https://twitter.com/karpathy/status/1733181701361451130 |
|