Y
Hacker News
new
|
ask
|
show
|
jobs
by
reissbaker
316 days ago
It was natively trained in FP4. Probably both to reduce VRAM usage at inference time (fits on a single H100), and to allow better utilization of B200s (which are especially fast for FP4).
1 comments
irthomasthomas
316 days ago
Interesting, thanks. I didn't know you could even train at FP4 on H100s
link
reissbaker
314 days ago
It's impressive they got it to work — the lowest I'd heard of this far was native FP8 training.
link