|
|
|
|
|
by htrp
199 days ago
|
|
Trinity Nano Preview: 6B parameter MoE (1B active, ~800M non-embedding), 56 layers, 128 experts with 8 active per token Trinity Mini: 26B parameter MoE (3B active), fully post-trained reasoning model They did pretraining on their own and are still training the large version on 2048 B300 GPUs |
|