| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sanjiwatsuki 900 days ago
	The VRAM usage is closer to a 47B model - although only 2 experts are used at a time for inference, all experts are needed to complete it.

1 comments

Confirmed. Currently running Mixtral 8x7B gguf (Q8_0) on a Macbook Pro M1 Max w 64GB ram, and RAM usage is sitting at 48.8 GB.

How many t/s?

Around 15 - 20 t/s

Thank you, got the same build M1 Max during Christmas B&H sale and can confirm it's amazing for running local LLMs.