Y
Hacker News
new
|
ask
|
show
|
jobs
by
keheliya
510 days ago
Running it in a MacBook Pro entirely locally is possible via Ollama. Even running the full model (680B) is possible distributed across multiple M2 ultras, apparently:
https://x.com/awnihannun/status/1881412271236346233
2 comments
vessenes
510 days ago
That’s a 3 bit quant. I don’t think there’s a theoretical reason you couldnt run it fp16, but it would be more than two M2 Ultras. 10 or 11 maybe!
link
bildung
509 days ago
Well there's the practical reason of the model natively being fp8 ;) One of the innovative ideas making it so much less compute-intensive, apparently.
link
rsanek
510 days ago
the 70B distilled version that you can run locally is pretty underwhelming though
link