Hacker News new | ask | show | jobs
by ancientworldnow 695 days ago
This was trained to be run at FP8 with no quality loss.
1 comments

The model description on huggingface says - Model size - 12.2B params, Tensor type - BF16. Is the Tensor type different from the training param size?