| > API that is likely a loss-leader to grab market share (hosted LLM cloud models). I don't think so, not anymore. If you look at API providers that host open-source models, you will see that they have very healthy margin between their API cost and inference hardware cost (this is, of course, not the only cost) [1]. And that does not take into account any proprietary inference optimizations they have. As for closed-model API providers like OpenAI and Anthropic, you can make an educated guess based on the not-so-secret information about their model sizes. As far as I know, Anthropic has extremely good margins between API cost and inference hardware cost. [1]: This is something you can verify yourself if you know what it costs to run those models in production at scale, hardware wise. Even assuming use of off-the-shelf software, they are doing well. |
Yeah, people tout RAG and fine tuning, but lots of people just use the base chat model, if it doesn't keep up to date on new data, it falls behind. How much are these companies spending just keeping up with the Joneses?