The Mac is very feeble compared to the big iron that the providers run so will be much lower performance. Also many companies would prefer engineers work on the domain problems instead of working on novel LLMs.
The Mac Studio (and DGX Spark, for that matter) aren't running SOTA-level models by a large margin. Time is money, and waiting on these half-baked solutions is a waste of them both.
Especially concerning the Mac Studio, the GPU is far too weak for enterprise-scale context prefill. You'd need 2 or 4 Studios to process 250k contexts quickly, and even then you'd get bottlenecked by the relatively slow memory bandwidth during the decode stage. It is simply terrible hardware for quick or power efficient inference.