Hacker News new | ask | show | jobs
by sanjiwatsuki 900 days ago
The VRAM usage is closer to a 47B model - although only 2 experts are used at a time for inference, all experts are needed to complete it.
1 comments

Confirmed. Currently running Mixtral 8x7B gguf (Q8_0) on a Macbook Pro M1 Max w 64GB ram, and RAM usage is sitting at 48.8 GB.
How many t/s?
Around 15 - 20 t/s
Thank you, got the same build M1 Max during Christmas B&H sale and can confirm it's amazing for running local LLMs.