| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by keheliya 510 days ago
	Running it in a MacBook Pro entirely locally is possible via Ollama. Even running the full model (680B) is possible distributed across multiple M2 ultras, apparently: https://x.com/awnihannun/status/1881412271236346233

2 comments

vessenes 510 days ago

That’s a 3 bit quant. I don’t think there’s a theoretical reason you couldnt run it fp16, but it would be more than two M2 Ultras. 10 or 11 maybe!

link

bildung 509 days ago

Well there's the practical reason of the model natively being fp8 ;) One of the innovative ideas making it so much less compute-intensive, apparently.

link

rsanek 510 days ago

the 70B distilled version that you can run locally is pretty underwhelming though

link