Hacker News new | ask | show | jobs
by gizajob 10 days ago
I meant “roll your own” LLM for use not build new ones.
1 comments

The Mac Studio (and DGX Spark, for that matter) aren't running SOTA-level models by a large margin. Time is money, and waiting on these half-baked solutions is a waste of them both.

Especially concerning the Mac Studio, the GPU is far too weak for enterprise-scale context prefill. You'd need 2 or 4 Studios to process 250k contexts quickly, and even then you'd get bottlenecked by the relatively slow memory bandwidth during the decode stage. It is simply terrible hardware for quick or power efficient inference.