Hacker News new | ask | show | jobs
by cma 353 days ago
Didn't deep-seek figure out how to train with mixed precision and so get much more out of the cards, with a lot of the training steps able to run at what was traditionally post training quantization type precisions (block compressed).