| True, but this is not only a trade-off between opex and capex. Local inference using open weight models provides guaranteed performance which will remain stable over time, and be available at any moment. As many current HN threads show, depending on external AI inference providers is extremely risky, as their performance can be degraded unpredictably at any time or their prices can be raised at any time, equally unpredictably. Being dependent on a subscription for your programming workflow is a huge bet, that you will gain more from a slightly higher quality of the proprietary models than you will lose if the service will be degraded in the future. As the recent history has shown, many have already lost this bet. I am not a gambler, so I have made my choice, which is local AI inference, using a variety of models depending on the task, i.e. both small models completely executable on relatively cheap GPUs (like the new Intel GPUs), medium models that need e.g. 128 GB on a CPU, and huge models that must be stored on fast SSDs (e.g. interleaved on multiple PCIe 5.0 SSDs). Such a strategy is achievable with a modest capex, in the lower half of the 4-digit range. |
Personally I hope we see a third way - strong open weight models hosted by variety of companies actually competing on price and 9s of availability. That way capex expensive GPUs are fully utilized and users can rent intelligence as a commodity.
There is a very apt analogy to virtual server hosting - hosting vps/shared web is a commodity, it does not make financial sense for most users to host their website on their own physical servers in their basements.