| HN Mirror

I imagine it's the lack of transparency. The costs are obviously coming down as people figure out how to tune both hardware and software. But there are costs other than just electricity as well. For example, chips do burn out, I recall reading that 2 to 3 years is roughly what you can expect under inference loads, so replacing chips is a non trivial operational cost.

Also, as the costs of running this stuff come down, the incentive to rent models goes down with them. Running local models has the benefit that you get to keep your data local, you can tune them to do what you like, and you're not subject to model or price changes down the road. This makes self hosting appealing both to individuals and companies. Currently, the barrier is in needing significant resources to run the models, but companies are already increasingly doing that with open models. And local inference that regular people can run is becoming a possibility as well.

While I'm sure there's always going to be a market for renting out models as a service, it may shrink significantly as the costs continue to come down.