Hacker News new | ask | show | jobs
by brandall10 806 days ago
Unless something has changed, it needs to load the full 8 models at the same time. During inference it performs like a 2 x base model.

Mixtral 7B @ 5 bit takes up over 30gb on my M3 Max. That's over 90 for this at the same quantization. Realistically you probably need a 128gb machine to run this with good results.

1 comments

A 4 bit quant of the new one would still be about 70 gb, so yeah. Gonna need a lot more ram.