Hacker News new | ask | show | jobs
by red2awn 181 days ago
Very interesting release:

* Hybrid MoE: 2-3x faster than pure MoE transformers

* 1M context length

* Trained on NVFP4

* Open Source! Pretraining, mid-training, SFT and RL dataset released (SFT HF link is 404...)

* Open model training recipe (coming soon)

Really appreciate Nvidia being the most open lab but they really should make sure all the links/data are available on day 0.

Also interesting that the model is trained in NVFP4 but the inference weights are FP8.

1 comments

The Nano model isn’t pretrained in FP4, only Super and Ultra are. And posttraining is not in FP4, so the posttrained weights of these models are not native FP4.