My instinct is that it would be cheaper overall to buy API credits when needed, compared with buying a top-of-the-line GPU which sits idle for most of the day. That also opens up access to larger models.
It's a choice. Running local means personal safety and privacy. It could also mean easier compliance with any enterprise that doesn't want to share data.
Agrees with my own experience. I have a 4070 super which of course is nothing to brag about, but tps using quantized 27b model is miserable. I could go down to 12b or even smaller, but it would sacrifice in quality. Then I could upgrade my gears, but I realize that however much I spend, the experience is not going to be as smooth as off-the-shelf LLM products, and definitely not worth the cost.
Of course it is nice to have an LLM running locally where nobody gets to know your data. But I don't see a point in me spending thousands of $ to do that.