|
|
|
|
|
by embedding-shape
173 days ago
|
|
> $10k gets you a Mac Studio with 512GB of RAM, which definitely can run GLM-4.7 with normal, production-grade levels of quantization (in contrast to the extreme quantization that some people talk about). Please do give that a try and report back the prefill and decode speed. Unfortunately, I think again that what I wrote earlier will apply: > In practice, it'll be incredible slow and you'll quickly regret spending that much money on it I'd rather place that 10K on a RTX Pro 6000 if I was choosing between them. |
|
M4 Max here w/ 128GB RAM. Can confirm this is the bottleneck.
https://pastebin.com/2wJvWDEH
I weighed about a DGX Spark but thought the M4 would be competitive with equal RAM. Not so much.