Hacker News new | ask | show | jobs
by maremmano 929 days ago
Who know if I can run this on MBC Pro M3 max 128gb? at what TPS?
4 comments

If I understand correctly:

RAM Wise, you can easily run a 70b with 128GB, 8x7B is obviously less than that.

Compute wise, I suppose it would be a bit slower than running a 13b.

edit: "actually", I think it might be faster than a 13b. 8 random 7b ~= 115GB, Mixtral is under 90. I will have to wait for more info/understanding.

I would say so based on LLaMA 2 70B; if it's 8x inference in MoE then I guess you'd see <20 tokens/sec?
Big chance that you’ll be able to run it using Ollama app soon enough.
I would like to know this as well.