| HN Mirror

No, it feels more like the disconnect is that I think they're all compute-limited and you maybe don't? Almost every flop they use to serve a query at a loss is a flop they didn't use for training, research, or for queries that would have given them data to enable better training.

Like, yes, if somebody has 100k H100s and are only able to find a use for 10k of them, they'd better find some scale fast; and if that scale comes from increasing inference workloads by 10x, there's going to be efficiencies to be found. But I don't think anyone has an abundance of compute. If you've instead got 100k H100s but demand for 300k, you need to be making tradeoffs. I think loss-making paid inference is fairly obviously the worst way to allocate the compute, so I don't think anyone is doing it at scale.

> I think deepseek instead just showed they haven't really bothered yet.

I think they've all cared about aggressively optimizing for inference costs, though to varying levels of success. Even if they're still in a phase where they literally do not care about the P&L, cheaper costs are highly likely to also mean higher throughput. Getting more throughput from the same amount of hardware is valuable for all their use cases, so I can't see how it couldn't be a priority, even if the improved margins are just a side effect.

(This does seem like an odd argument for you to make, given you've so far been arguing that of course these companies are selling at a loss to get more scale so that they can get better margins.)

> - You are hyper fixated on tokens, and not that owning a large % of distribution lets them sell other things . Eg, instead of responding to my point 2 here, you are again talking about token margin. Apple doesn't have to make money on transistors when they have a 30% tax on most app spend in the US.

I did not engage with that argument because it seemed like a sidetrack from the topic at hand (which was very specifically the unit economics of inference). Expanding the scope will make convergence less likely, not more.

There's a very good reason all the labs are offering unmonetized consumer products despite losing a bundle on those products, but that reason has nothing at all to do with whether inference when it is being paid for is profitable or not. They're totally different products with different market dynamics. Yes, OpenAI owning the ChatGPT distribution channel is vastly valuable for them long-term, which is why they're prioritizing growth over monetization. That growth is going to be sticky in a way that APIs can't be.

Thanks, good discussion.