| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jsnell 383 days ago

I don't think I was ignoring your points. I thought I was replying very specifically to them, to be honest, and providing very specific arguments. Arguments that you, by the way, did not respond to in any way here, beyond calling them "[you] don't even know what". That seems quite rude, but I'll give you the benefit of the doubt.

Maybe if you could name one of those potential opportunities, it'd help ground the discussion in the way that you seem to want?

Like, let's say that additional volume means one can do more efficient batching within a given latency envelope. That's an obvious scale-based efficiency. But a fuller batch isn't actually valuable in itself: it's only valuable because it allows you to serve more queries.

But why? In the world you're positing where these queries are sold at negative margins and don't provide any other tangible benefit (i.e. cannot be used for training), the provider would be even better off not serving those queries. Or, more likely, they'd raise prices such that this traffic has positive margins, and they receive just enough for optimal batching.

> You can claim these opportunities aren't enough, but you don't seem to be willing to do so.

Why I would claim that? I'm not saying that scaling is useless. I think it's incredibly valuable. But scale from these specific workloads is only valuable because these workloads are already profitable. If it wasn't, the scarce compute would be better off being spent on one of the other compute sinks I listed.

(As an example, getting more volume to more efficiently utilize the demand troughs is pretty obviously why basically all the major providers have some sort of batch/off-peak pricing plans at very substantial discounts. But it's not something you'd see if their normal pricing had negative margins.)

> Engineering opportunities at volume and high skill allow changing the margin in ways low volume and low capitalization provider cannot.

My point is that not all volume is the same. Additional volume from users whose data cannot be used to improve the system and who are unprofitable doesn't actually provide any economies of scale.

> 2. Again, even if tokens are unprofitable at scale (which I doubt),

If you doubt they're unprofitable at scale, it seems you're saying that they're profitable at scale? In that case I'd think we're actually in violent agreement. Scaling in that situation will provide a lot of leverage.

1 comments

lmeyerov 383 days ago

> But why? In the world you're positing where these queries are sold at negative margins and don't provide any other tangible benefit (i.e. cannot be used for training), the provider would be even better off not serving those queries. Or, more likely, they'd raise prices such that this traffic has positive margins, and they receive just enough for optimal batching. ... But scale from these specific workloads is only valuable because these workloads are already profitable

I'm disputing this two-fold:

- Software tricks like batching and hardware like ASICs mean what is negative/neutral for a small or unoptimized provider is eventually positive for a large, optimized provider. You keep claiming they cannot do this with positive margin some reason, or only if already profitable, but those are unsubstantiated claims. Conversely, I'm giving classic engineering principles why they can keep driving down their COGS to flip to profitability as long as they have capital and scale. This isn't selling $1 for $0.90 because there is a long way to go before their COGS are primarily constrained by the price of electricity and sand. Instead of refuting this... You just keep positing that it's inherently negative margin.

In a world where inference consumption just keeps going up, they can keep pushing the technology advantage and creating even a slight positive margin goes far. This is the classic engineering variant of buttoning margins before an IPO: if they haven't yet, it's probably because they are intentionally prioritizing market share growth for engineering focus vs cost cutting.

- You are hyper fixated on tokens, and not that owning a large % of distribution lets them sell other things . Eg, instead of responding to my point 2 here, you are again talking about token margin. Apple doesn't have to make money on transistors when they have a 30% tax on most app spend in the US.

link

lmeyerov 383 days ago

Maybe this is the disconnect for the token side: you seem to think they can't keep improving the margin to reach profitability. They are static and it will just get worse, not better.

I think deepseek instead just showed they haven't really bothered yet. They rather focus on growing, and capital is cheap enough for these firms that optimizing margins is relatively distracting. Obviously they do optimize, but probably not at the expense of velocity and growth.

And if they do seriously want to tackle margins, they should pull a groq/Google and go aggressively deep. Ex: fab something. Which... They do indeed fund raise on.

link

jsnell 383 days ago

No, it feels more like the disconnect is that I think they're all compute-limited and you maybe don't? Almost every flop they use to serve a query at a loss is a flop they didn't use for training, research, or for queries that would have given them data to enable better training.

Like, yes, if somebody has 100k H100s and are only able to find a use for 10k of them, they'd better find some scale fast; and if that scale comes from increasing inference workloads by 10x, there's going to be efficiencies to be found. But I don't think anyone has an abundance of compute. If you've instead got 100k H100s but demand for 300k, you need to be making tradeoffs. I think loss-making paid inference is fairly obviously the worst way to allocate the compute, so I don't think anyone is doing it at scale.

> I think deepseek instead just showed they haven't really bothered yet.

I think they've all cared about aggressively optimizing for inference costs, though to varying levels of success. Even if they're still in a phase where they literally do not care about the P&L, cheaper costs are highly likely to also mean higher throughput. Getting more throughput from the same amount of hardware is valuable for all their use cases, so I can't see how it couldn't be a priority, even if the improved margins are just a side effect.

(This does seem like an odd argument for you to make, given you've so far been arguing that of course these companies are selling at a loss to get more scale so that they can get better margins.)

> - You are hyper fixated on tokens, and not that owning a large % of distribution lets them sell other things . Eg, instead of responding to my point 2 here, you are again talking about token margin. Apple doesn't have to make money on transistors when they have a 30% tax on most app spend in the US.

I did not engage with that argument because it seemed like a sidetrack from the topic at hand (which was very specifically the unit economics of inference). Expanding the scope will make convergence less likely, not more.

There's a very good reason all the labs are offering unmonetized consumer products despite losing a bundle on those products, but that reason has nothing at all to do with whether inference when it is being paid for is profitable or not. They're totally different products with different market dynamics. Yes, OpenAI owning the ChatGPT distribution channel is vastly valuable for them long-term, which is why they're prioritizing growth over monetization. That growth is going to be sticky in a way that APIs can't be.

Thanks, good discussion.

link

lmeyerov 383 days ago

I agree they are compute limited and disagree that they are aggressively optimizing. Many small teams are consistently showing many optimization gain opportunities all the way from app to software to hardware, and deepseek was basically just one especially notable example of many. In my experience, there are levels of effort to get corresponding levels of performance, and with complexity slowdowns on everyone else, so companies are typically slow-but-steady here, esp when ZIRP rewards that (which is still effectively in place for OpenAI). Afaict OpenAI hasn't been pounding on doors for performance people, and generally not signalling they go hard here vs growth.

Re: Stickiness => distribution leadership => monetization, I think they were like 80/20 on UI vs API revenue, but as a leader, their API revenue is still huge and still growing, esp as enterprise advance from POCs. They screwed up the API market for coding and some others (voice, video?), so afaict are more like "one of several market share leaders" vs "leading" . So the question becomes: Why are they able to maintain high numbers here, eg, is momentum enough so they can stay tied in second, and if they keep lowering costs, stay there, and enough so it can stay relevant for more vertical flows like coding? Does bundling UI in enterprise mean they stay a preferred enterprise partner? Etc . Oddly, I think they are at higher risk of losing the UI market more so than the API market bc an organizational DNA change is likely needed for how it is turning into a wide GSuite / Office scenario vs simple chat (see: Perplexity, Cursor, ...). They have the position, but it seems more straightforward for them to keep it in API vs UI.

link