But with macOS hardware? What kind of calculations did you do that shows that this makes sense? 6-7 months ago I looked into the same, getting a bunch of Apple hardware to do local LLM inference (for a client), but the bad performance + high pricing just makes it completely infeasible compared to nvidia hardware.
Now I'm really curious how you got that to make sense in reality.
250+ employees on the lowest paid subscriptions of claude or openai is $5k/month. their usage of AI is chatbot most of the time which local models/inference can easily handle. so just being able to cancel those subscriptions with local hardware makes my break even on a single $10k mac studio 2 months and honestly the mac studio for their use is overkill.
Now I'm really curious how you got that to make sense in reality.