Y
Hacker News
new
|
ask
|
show
|
jobs
by
vessenes
510 days ago
That’s a 3 bit quant. I don’t think there’s a theoretical reason you couldnt run it fp16, but it would be more than two M2 Ultras. 10 or 11 maybe!
1 comments
bildung
509 days ago
Well there's the practical reason of the model natively being fp8 ;) One of the innovative ideas making it so much less compute-intensive, apparently.
link