Hacker News new | ask | show | jobs
by fulafel 14 days ago
The Mac is very feeble compared to the big iron that the providers run so will be much lower performance. Also many companies would prefer engineers work on the domain problems instead of working on novel LLMs.
1 comments

I meant “roll your own” LLM for use not build new ones.
The Mac Studio (and DGX Spark, for that matter) aren't running SOTA-level models by a large margin. Time is money, and waiting on these half-baked solutions is a waste of them both.

Especially concerning the Mac Studio, the GPU is far too weak for enterprise-scale context prefill. You'd need 2 or 4 Studios to process 250k contexts quickly, and even then you'd get bottlenecked by the relatively slow memory bandwidth during the decode stage. It is simply terrible hardware for quick or power efficient inference.