| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by namegulf 48 days ago
	They have a 512gb ram option but pricey. Have you tried any other models with this M3 Ultra?

1 comments

bigyabai 48 days ago

The 512gb model would have to use a lobotomized quant like q_2 or q_1, and you would still be waiting 3-5 minutes to process context lengths in the 32,000-64,000 token range.

Apple's GPUs are just not very fast for inference. I'd stick to the smaller 7b-18b parameter range or MOE models like Qwen if you want a usable inference speed.

link

namegulf 48 days ago

Looks like that's a good idea for now. Yeah 3-5 mins is not practical use.

Any thoughts on M5?

They may be soon releasing a M5 model with mac studio/mini.

link

namegulf 48 days ago

NVIDIA DGX Spark a good option?

$4,699.00

But looks like we may need a NVIDIA AI Enterprise - DGX Spark License

link