Hacker News new | ask | show | jobs
by hedora 2 days ago
What moat? There are multiple companies providing pareto-optimal frontier models, and it takes O(10) people to build one of these things.

The rest is capital intensive, and the price will approach the cost of production over time.

Thinking this is a profitable endeavor is equivalent to claiming coal plants have good margins because boilers are expensive.

4 comments

I think we agree?

What moat? You answered yourself: "capital intensive"

But, history says the supercomputer of today will fit in your pocket in a few years.

They've bought up all the RAM and GPUs, which pushes the capital requirements upward for everyone else. But, they can't corner the market forever, there are too many competing interests. AMD and Intel keep making new GPUs and APUs. The memory makers can't just sell to only AI companies forever, if they do Chinese manufacturers will move in and eventually eat them from below (as has happened many times before).

They have a moat today, and it's just that it's really expensive to train and host frontier models, especially at commercial scale. It used to be there was also some secret sauce to making it fast and efficient. But, secret sauce is being published daily by all sorts of researchers, folks are figuring out how to do more with less and it often finds its way into llama.cpp or vLLM or SGLang within days or weeks.

> But, history says the supercomputer of today will fit in your pocket in a few years.

I don't think this will be true in the same time span anymore. Each miniaturization is costing more and more money.

Perhaps they'll come up with exotic fundamental improvements, but I don't think the rate of improvement of compute/watt will match the previous decades.

Yeah, that's probably true, but we're also seeing that there's still tons of inefficiencies in how LLMs are being run. Seems like every couple months there's some new technique to squeeze more performance out of less hardware. KV caching improvements, fast attention, speculative decoding, dynamic quantization, quantization aware training, etc.

That said, I recently replaced my five year old self-built PC (with a top-of-the-line desktop CPU, chipset, memory, and GPU of the time) with a new everything-the-best build, and while it's clear we're not keeping up with Moore's Law anymore, it's still 4-5 times faster for compute-intensive stuff, especially parallelizable tasks. We're still getting faster/cheaper. So, the time scale is maybe ten years rather than five.

It's highly unlikely AI inference doesn't follow the same path as general purpose computing: variety and innovations in software lead to standardization on highest performance approaches.

As that transition happens, hardware evolves from general purpose (because nobody knows what's needed and hardware design is slow) to fixed function high performance (once requirements are better defined).

GPUs (and TPUs) are a weird middle-ground here, as they're already fairly specialized, but I wouldn't bet against next gen AI inference-optimized hardware architectures dominating that use case in ~10 years if the pace of AI arch tweaking slows.

The efficiency/power/cost gains from fixed function optimization are always too great, and the only thing that holds that approach back is rapidly mutating requirements.

Really the biggest concerns are not computers getting spectacularly faster, but 'intelligence' algorithms getting orders of magnitude better.

Drop the power requirements 1000 fold, and yea you will be able to make your own SOTA model on the cheap. The problem is the person that has a few exaflops of power will still leave you in the dust in the intelligence explosion that would happen after an event like this.

Depends upon the intelligence vs compute scaling law— which I think no one really knows. Pretty likely to be some degree of diminishing returns, but how much? Is it logarithmic, inverse quadratic, …

If training models gets way cheaper, I would expect the diminishing returns to get steeper too.

And you're right, no one has any clue what the limits of intelligence are. Though to me it seems odd that humanity has reached the pinnacle of it in the last million years or so after a few billion years of lifes development. Just seems improbable we are close to the limits.
I am not making an argument about limits. I just expect some degree of diminishing returns.

A related argument is speed of intelligence vs capability at that speed. You can think of a three way trade off between latency, cost, and capability that is unlikely to be linear in any dimension and that changes in steps as technology or biology evolves.

Ultimately relating to the properties of the computing substrate and almost certainly bounded by some kind of thermodynamic limits that present systems do not approach.

>Pretty likely to be some degree of diminishing returns

intelligence may be different. If we look at biological brains - do we get diminishing returns or completely opposite scaling law when we compare our brain against say gorilla's ?

Interesting thought to consider in principle but fails because gorilla brains continued to evolve too, just along a different path. They're not snapshots of ancestral species locked in time.
Single clock speed hasn't had much of an upgrade, but the architecture for doing exactly what they are doing? That will improve for at least 5-10 years. There are both huge power gains from Processing in Memory (PIM) chips (70-80% discount in energy), and improvements to engineering to make memory cheaper and cheaper.
Yes, I'm talking about a supercomputer from today in your pocket. That probably requires at least 5000x perf/watt if not even more.
That’s only two order of magnitude software optimizations, a bunch of plus delta’s, and one order of magnitude on hw.

I’d give that over 50% odds of happening in the next few years.

I don't disbelieve a 5000x speedup is possible, I disbelieve that a modern day supercomputer will fit in your pocket in even the next 10 years.
That has never been true, unfortunately. The 2005 top500 was led by bluegene/L achieving 280 FP64 TFlop/s.

Apple is talking about 17.5 FP16 TFlop/s on the iphone 17 neural engine. So 20 years later we are still nowhere near, not even at reduced precision.

That’s a factor of 10-20.

You can get an SoC that does 126 TOPs (strix halo) in tablet form factor, which is a factor of two. (I’ll count them as equivalent ops, since software couldn’t low precision floating point back then). So, not quite “pocket”, but probably “purse” and certainly backpack.

Because we’ve been able to spend more and more on the next miniaturization. That does not seem infinitely sustainable or even physically possible to sustain indefinitely.
In five years I think you will be able to train a frontier modem for much less money than today and the power hungry hardware of today will be cheap second hand due to the power usage.
There are probably better ways to communicate across a wire than having an LLM voltage-bang, but it's certainly an interesting use case.
>but I don't think the rate of improvement of compute/watt will match the previous decades.

Unless we invest heavily in research and find new way to do chips. But I think there's not enough motivation and money to do that.

There's literally never been more money being thrown at that problem.
> I think we agree?

That is such a crazy way to start a response to someone trying to argue with you. I should try this. That's amazing. I know you didn't mean it as a trick, at least I'm pretty sure you meant it sincerely, but I'm just struck by the power of it to defuse and redirect the conversation. And this was a very low-grade example, but I could imagine this being useful in much more heated contexts.

I think in general stripping away the parts you agree with from the argument works great, because it strips away a whole lot of potential for ending up indirectly arguing over things that aren't in contention, and it often also defuses the rest when it turns out the core of the argument perhaps is much smaller than people are willing to get invested in.
How do you do that without sounding negative? Because by doing that there's the risk of the general impression "we didn't agree", as you basically focused on the disagreements.
"You're totally right about X and Y. I think the only thing we disagree about is Z". People like being told they're right, and you then downplay the importance of the actual remaining disagreement. Often that lowers the stakes for people. They've already "won" since you agreed with most of what they said, so the rest becomes less important.
Repeating back what someone said (specifically: trying to mirror their exact words as best you can remember them) also has proven psychological effects: increased empathy and calming of your own emotional response and theirs.

It's a component of a few psych frameworks around improving interpersonal conflict. Ref: https://hartsteinpsychological.com/the-power-of-active-liste...

Short template form is "What I think I heard you say is (repeat their words as exactly as possible)? Did I get that right?"

OTOH I have often witnessed people agreeing without realizing it. I‘ve been able to defuse a bunch of arguments by pointing that out.
In fairness I completely agree with 99% of their comment.

I was nitpicking the use of the word “moat”. For it to be a moat, it’d need to be more expensive to traverse than to build.

Instead, the big AI firms are trying to create a monopoly on capital in an area where real costs are dropping 90% year over year.

Yeah, more valuable than the comments I came to read (even if those are interesting too!)
Usually people are taught these techniques at the management courses. If you're at a BigCorp where they push managers through such courses - you can hear a lot of that stuff in your manager's speech if you pay attention to it.
The other half of the moat is the data they stole from everyone else, some of it illegally. So, be sure they will do everything in their power to stop others from getting that data freely.
Yeah, I think a lot of the "slow down" rumblings we're hearing from OpenAI and Anthropic are really overtures toward regulatory capture; basically, "now that we're in the lead, we need to lock this shit down so nobody else can catch up."
I think they’ve a) increased the hype as much as humanly possible about incremental improvements mixed with regressions, b) know they soon will have to multiply their prices several times when the VC subsidies dry up, and c) will probably still need to partially close the faucet on compute. They’re priming us for a heroic explanation why their service (not models — service) is simultaneously becoming a lot more expensive AND shittier.

“We’ve failed to deliver on 5 years of promises after wasting billions of dollars… sorry” is a death knell. However, “We’ve decided to not deliver on 5 years of promises after wasting billions of dollars… for safety… but keep those investments rolling in” is like crack to the true believers.

but.. OpenAI and Anthropic can't stop China and EU, can they?

Depends on your world view, they might or might not come up with something better. but I guess we can agree nothing with stop them from _trying_?

US successfully enforce DMCA and other copyright stuff on EU while giving free pass to own bigtech now.

China will certainly compete though.

The EU is slowly getting out of the rectum of the united states. Let's see if that trend continues.
They’ve bought up all the RAM and GPUs…

Is there an endgame where even this is considered overly complex? Instead of starving the competition by buying up all the compute, why not just buy up… all the money!? Hoover up as much investment capital as possible so that your competitors can’t get funding.

I assume this is an honest question, in which case the answer is funding is not really finite.
Funding can be illiquid for limited spans of time: i.e. pre-IPO.

Anthropic / OpenAI / SpaceX going public makes it easier for capital to both flow to and away from them.

or just "buy" your competition like big tech did

every major tech company literally have deal,ownership,alliance etc

they literally not gonna gobble up entirely to trigger anti-trust case

They did get a bunch of investment grants from Trump, so your tax money (and power bills) are subsidizing them. They also arranged for ETFs to eliminate consumer protection rules to force everyone’s retirement to buy SpaceX/Anthropic/OpenAI shortly after IPO. That totals $3T in valuation (unless it goes up in first week trading), so your retirement is basically going to be weighing “AI bubble” similar to “MAGA”, and then everything else is rounding error. (The rule changes waive profitability requirements, and shorten the cooldown from IPO to indexing from a year to weeks).

I guess that’s one way to try to make capital finite.

>But, history says the supercomputer of today will fit in your pocket in a few years.

That was Moore's law saying that. And it seems Moore's law slowed down quite a bit for now.

Yes, but surely AI are going to save us from the bloated stack of modern software.
"But, history says the supercomputer of today will fit in your pocket in a few years."

hmm nooo ??, physic says otherwise

> it takes O(10) people to build one of these things

To build a working prototype, sure. To operate at production scale, definitely not. The same rule would apply to WhatsApp and many other world-scale products. Turns out that, the moment you need to monetize these machines, your O(10) stops working.

O(10) people?
So, a constant number of people.

(less facetiously, I think they mean "5 to 50")

Other models arent even close except for gpt 5.5. You're dead wrong on that. You read too many benchmarks and/or chinese propaganda. There hasn't been a serious contender in agentic SWE besides OAI and anthropic for a long time, and no chinese model has even reached opus 4.5 performance yet. The moat isnt insurmountable but it is very solid for at least a 12 month lead time. Which is such an insane amount of time in this landscape and industry. The moat is stretching, not shrinking, on agentic SWE. And that is literally the only moat that matters for RSI.
DeepSeek 4 Pro is performing agentic SWE tasks for me quite well. It can't do everything Opus can do, but if OpenAI and Anthropic disappeared tomorrow, I'd figure out ways to make it work with harness improvements and other optimizations.

Anthropic can stretch the moat all they want, but in the department of trust, they put a final nail in their coffin today. Anthropic is pure evil at this point.

'evil' lol. Every single corporation you deal with is evil then. it's greed. and almost every large model provider is guilty of it. China is all open source right now. cool! gee i wonder what would happen if they ever actually achieved SOTA? They would clamp down on that so fast Dadio's dradel would spin
China isnt "all open source" they still keep their top models out of the public view. Its easy to "open source" models when they're so far behind very few will pay for them.

Open source in quotes because they are not open source and not even close to open source.

And what models do they keep out of public view? What ridiculous propaganda is this?!
Can't we stop using "open-source" when it is just freeware?
Open-weight is both meaningful and unique term.
> Every single corporation you deal with is evil then.

I don't know. If my ISP started MITMing my traffic so that they could silently rewrite packets, and/or deleting files on my computer because they thought me sharing wireless AP with my SO was me trying to compete with them, I'd call them evil.

I believe they tried something similar to the first one a few years ago in the US, and I remember people called that evil to the point where tech giants shut down their websites in protest.

> gee i wonder what would happen if they ever actually achieved SOTA? They would clamp down on that so fast Dadio's dradel would spin

Cool. Let them "achieve SOTA" and close down the models. Let the pendulum swing the other way.

You seem to not understand what China's goal is here. They want the AI bubble to burst and take your 401ks with it. And OAI/ANTs decisions are driving you towards that cliff.

I use gpt 5.5 at work (because they pay for it) and DeepSeek at home (because I pay for it) and while I do agree one is better than the other, I think you’re really overstating how far apart they are. Just my take.
What's 12 months lead time worth? Not much from what I can tell. Contrary to what these AI companies might tell you, if an AI model can't do it, a human can still do the work.
Honest question, is it possible that since might be using the latest/best model to analyze and improve the existing ones, the moat will expand exponentially, making the models better and more efficient at each iteration until there is no point in competing?
All models from the past two years are close in the general case.

This is just another incremental improvement, rushed out to boost the ipo, AI has the capacity to aid an engineer but this minor bump in performance will have essentially zero impact on the productivity of an engineer working on real world solutions when compared with any other major model.

We are trending towards asymtotic and it can't happen fast enough, that's when the true cost of this will become evident.

Most of HN is stuck in this fantasyland where they insist their local LLM setup is comparable to Opus 4.8 or GPT 5.5. It's like a collective delusion, I've never seen anything like it.
You can get really good results with Chinese models. You're putting Opus and GPT on too high of a pedestal.
I use Chinese models (for simple personal projects), they just don't compare to GPT or Opus for any serious work.

I do not know why every Chinese model fan thinks that people that aren't impressed by them simply don't use them.

Wast majority of software engineers do very little except of moving JSONs around and building CRUDs.

It's quite obvious that when you dont try to do something particularly complex there will be literally no difference between GPT, Claude, Gemini and Deepseek.

Fot many things I'm doing in gamedev Gemini 2.5 Pro was already good enough even though it released more than year ago.

Once you pass certain threshold it's just enough.

What constitutes serious work and how seriously have you tried to do serious work with them? While those trying to claim a 30B dense model can match Opus 4.6 are engaging in either beyond over-excessive over-exaggeration or performing rather routine tasks, it's disingenuous in the other direction to claim the latest open 1T models are not useful for serious work. I find those making such claims have rarely spent more than a few minutes on halfhearted attempts and often on recently obsoleted models.

Openweight models turned a corner around kimi 2.6, deepseek v4 pro/flash, hy3 and mimo 2.5 pro. Similar to how closed LLMs turned a corner around gpt 5.2 and opus 4.5.

While they remain a step behind closed frontier models, for real world tasks ranging across functional reactive programming, distributed systems, mathematical modeling, to-the-millisecond highly optimized spatial data-structures, complex compute shaders and shader effects and non-trivial systems involving parser combinators and algebraic effect systems, I can say that open models have very recently gone from useless to productive. For my work, mimo v2.5 pro is hands down better than sonnet 4.6.

Some of the new and open models are very capable now, The truth is, the value of the model is in the mind of the user - the big names are impressive to those who know little and are dazed by little, but they are bound to end up wrong regardless of how good the model is.
This is ridiculous. How about the rational users who use the best current model regardless of brand? The value of the model is in the quality of the output over time. I give every major model a chance. Coding and scripts in the chat are nothing compared to the power of agentic SWEEEEEEEEE. And nothing is remotely close to claude and gpt. If you're comfortable with being well behind SOTA intelligence, then good for you, but some of us prefer to be efficient with our time and resources. With your mindset, you will never truly SWEEEEEEEEEEEEEEEEEEEE
that isn't rational, rational is using the model that can best solve your current problem in the timeliest cost considered manner.

I'm not working on the frontier problems, I don't need god-in-a-box for $600 per month.

its not god in a box and its not $600 per month

and almost nobody is working on frontier problems. they just want frontier intelligence to solve their given problems in a superior manner.

you're minimizing and exaggerating all of the wrong things. cope more i guess - more compute for us!