but the costs of inference have been going down 20x to 30x over the years. so how can you tell it is nonviable? unless you are saying they are not paying market rate for the inference
So, they still booked up all the ram and ssd in the world and still going to use gigawatts of power. The price of energy production is not going to go down 20x and 30x it just means that they can cram in more inference on the same energy consumption if the cost goes down. But they aren't paying the market rate for inference because everything is subsidized with debt and investors money to scale as fast as possibly. They are flushed with money and that is why they can book up all silicon production.
This claim sounds extremely fancy when AI companies bleed money, and will keep bleeding money in the foreseeable future.
I don't pretend to know the future. Maybe LLMs become economically viable and are the future, maybe not. I don't really care either way, to be frank.
And I use LLMs, btw. I pay for a ChatGPT account, but I find it only moderately useful. I always sort of question myself upon renewal date if it is worth the 20 bucks I spend monthly on it.
In no small part I keep using it to keep myself up to date on the best practices of using them in case it becomes standard.
The graph you linked seems to compare different OpenAI models in terms of "price per million tokens".
I am very skeptical of any financial information that comes from OpenAI. I have no idea how truthful those numbers are, or how creatively they can be collected to paint a rosier future for them.
Even if the numbers are truthful, I have no idea how the calculate price there. Is it in terms of cost of compute they rent? Is this cost subsidized or not?
Also, I don't know this "epoch.ai" website, I don't know their stance. The website name itself does not inspire my confidence on their reporting of anything related to AI. "Eat meat, says the butcher" vibes and all.
You can claim that the AI bleeds money because training is expensive, but inference is cheap. So it will only be financially viable when they stop training models? So they would need to stop improving their capabilities entirely for it to make any sense, is that your claim?
Even if I take this claim at face value (and that would take a lot of faith I don't have to give), it doesn't sound as good as you think it does.
>To analyze the decline in LLM prices over time, we focused on the most cost-effective LLMs above a certain performance threshold at each point in time. To identify these models, we iterated through models sorted by release date. In each iteration, we added a model to the set of cheapest models if it had a lower price than all previous models that scored at or above the threshold.
Can you look at the analysis? It will make it clear. I mean its so obvious because GPT 4 costs way more than GPT 5.2-mini but much worse performance.
>Even if the numbers are truthful, I have no idea how the calculate price there. Is it in terms of cost of compute they rent? Is this cost subsidized or not?
Do you think they are subsidising 900x or simply that the costs have gone down?
Overall you have shown what I feel is extreme skepticism in something that is obvious. You can literally run a model in your laptop that matches an older closed model. Costs are obviously going down, I have shown data. Use your own anecdotes and report.
Extreme skepticism in such a way doesn't do any help.
> Overall you have shown what I feel is extreme skepticism in something that is obvious.
I think you show extreme faith in something that is very obscure.
For me to believe in the analysis I would need to trust the numbers that the analysis is based upon. I see no reason why I should trust this. What sort of regulatory body or neutral third party inspects those numbers to ensure they are not a fabrication?
But you can claim I am a hater if it justifies your worldview. Skepticism is sinful for the believer.
> For our language model benchmarking, we note that we consider endpoints to be serverless when customers only pay for their usage, not a fixed rate for access to a system. Typically this means that endpoints are priced on a per token basis, often with different prices for input and output tokens.
Okay, correct me if I am wrong, so this is measuring the inference costs for clients of AI services, not the the inference costs that the AI service itself has when they offer the service?
I mean, the other guy's claim is that inference costs had come down 20x-30x. But the analysis, if I understood correctly, is based on how much clients are paying for it, not how much it actually costs.
I can charge you 20x less for a service and have massive losses for it.
It could be that OpenAI is subsidising their models by _fifty times_. Do you really think they are doing that? In some cases the costs went down by 200x. Do you really think OpenAI is subsidising their models by 200??
Its easier to just admit that technological advances helped decrease the cost instead of coming up with more complicated reasons like VC funding, subsidies and so on.
For instance take Deepseek and other opensource models - even they have reduced their costs by a huge margin. What explanation is there for opensource models?