Hacker News new | ask | show | jobs
by bee_rider 381 days ago
> Yes, capex not opex. The cost of running inference is opex.

This seems sort of interesting, maybe (I don’t know business, though). I agree that the cost of running inference is part of the opex, but saying that doesn’t rule out putting other stuff in the opex bucket.

Currently these LLM companies train and models on rented Azure nodes in an attempt to stay at the head of the pack, to be well positioned for when LLMs become really useful in a “take many white collar jobs” sense, right?

So, is it really obvious what’s capex and what’s opex? In particular:

* The nodes used for training are rented, so that’s opex, right?

* The models are in some sense consumable? Or at least temporary. I mean, they aren’t cutting edge anymore after a year or so, and the open weights models are always sneaking up on them, so at least they aren’t a durable investment.

2 comments

> The nodes used for training are rented, so that’s opex, right?

It’s capex. They are putting money in, and getting an asset out (the weights).

> The models are in some sense consumable?

Assets depreciate.

Obsolete software don’t depreciate like obsolete hardware. If an LLM company has trained a truly better model, they can simply make as many copies of their own model as they want. Thus, if the new model is truly better in every way, the old one is completely valueless to them (of course there might be some tradeoffs which mean older models can stick around because they are, say, smaller… but, ultimately they will be valueless after some time).

Because models are still being obsoleted every couple years, old models aren’t an asset. They are an R&D byproduct.

> the old one is completely valueless to them

This is of course untrue for the same reason that people are still running Windows 2000.

> This is of course untrue for the same reason that people are still running Windows 2000.

What is the reason?

They’ve built processes around it and don’t feel like/can’t afford to/ don’t know to how change them.
I guess we’ll see how that shakes out.

Because models are getting much better every couple months, I wonder if getting too attached to a process built around one in particular is a bad idea.

I would agree if Windows 2000 had the exact same APIs as the next version, but it doesn't. LLMs are text in -> text out, and you can drop in a new LLM and replace them without changing anything else. If anything, newer LLMs will just have more capabilities.
> when LLMs become really useful

It looks to me similar to the situation with that newly fashionable WWW thing in, say, 1998. Everybody tried to use it, in search of some magic advantage.

Take a look at the WWW heavyweights today: say, Amazon, Google, Facebook, TikTok, WeChat. Are the web technologies essential for their success? Very much so. But TCP/IP + HTML + CSS + JS are mere tools that enable their real technical and business advantages: logistics and cloud computing, ad targeting, the social graph, content curation for virality, strong vertical integration with financial and social systems, and other such non-trivial things.

So let's wait until a killer idea emerges for which LLMs are a key enabler, but not the centerpiece. Making an LLM the centerpiece is the same thinking that was trying to make catchy domain names the centerpiece, leading to the dot com crash.