Y
Hacker News
new
|
ask
|
show
|
jobs
by
sanjiwatsuki
900 days ago
The VRAM usage is closer to a 47B model - although only 2 experts are used at a time for inference, all experts are needed to complete it.
1 comments
discordance
899 days ago
Confirmed. Currently running Mixtral 8x7B gguf (Q8_0) on a Macbook Pro M1 Max w 64GB ram, and RAM usage is sitting at 48.8 GB.
link
karolist
899 days ago
How many t/s?
link
discordance
896 days ago
Around 15 - 20 t/s
link
karolist
886 days ago
Thank you, got the same build M1 Max during Christmas B&H sale and can confirm it's amazing for running local LLMs.
link