|
|
|
|
|
by red2awn
181 days ago
|
|
Very interesting release: * Hybrid MoE: 2-3x faster than pure MoE transformers * 1M context length * Trained on NVFP4 * Open Source! Pretraining, mid-training, SFT and RL dataset released (SFT HF link is 404...) * Open model training recipe (coming soon) Really appreciate Nvidia being the most open lab but they really should make sure all the links/data are available on day 0. Also interesting that the model is trained in NVFP4 but the inference weights are FP8. |
|