| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ttul 57 days ago
	If you do the math (I did), in 2 years, open source models that you can run on a future MacBook Pro will be as capable as the frontier cloud models are today. Memory bandwidth is growing rapidly, as is the die area dedicated to the neural cores. And all the while, we have the silicon getting more power efficient and increasingly dense (as it always does). These hardware improvements are coming along as the open source models improve through research advancements. And while the cloud models will always be better (because they can make use of as much power as they want to - up in the cloud), what matters to most of us is whether a model can do a meaningful share of knowledge work for us. At the same time, energy consumption to run cloud infrastructure is out-pacing the creation of new energy supply, which is a problem not easily solved. I believe scarcity of energy will increasingly drive frontier labs toward power efficiency, which necessarily implies that the Pareto frontier of performance between cloud and local execution will narrow.

6 comments

nl 57 days ago

A Opus 4.7/Gpt5.5 class model is 5 trillion parameters[1].

To run a 8 bit quantized version of that you need roughly 5TB of RAM.

Today that is around 18 NVidia B300. That's around $900,000, without including the computers to run them in.

It's true that the capability of open source models is improving, but running actual frontier models on your MPB seems a way off.

[1] https://x.com/elonmusk/status/2042123561666855235?s=20 (and Elon has hired enough people out of those labs to have a fair idea)

link

crazylogger 57 days ago

People had this "why you probably can't run a GPT-4 (or even GPT-3.5) class model on your MBP anytime soon" conversation before.

Today's LLMs are able pack much more capabilities into fewer parameters compared to 2023. We might still be at the very rudimentary phase of this technology there are low-hanging efficiency gains to be had left and right. These models consume many orders of magnitude more energy than a human brain, this all seems like room for improvement.

The right question: is there a law in information theory that fundamentally prevents a 70B model of any architecture from being as smart as Opus 4.7?

link

kvern 56 days ago

There is a huge gap between "in two years" and "theoretically possible"

link

hnben 56 days ago

>> People had this "why you probably can't run a GPT-4 (or even GPT-3.5) class model on your MBP anytime soon" conversation before.

link

Difwif 57 days ago

The OP said "as capable as the frontier cloud models are today" which might assume model improvements that do more with less. Opus 4.7/Gpt5.5 performance might be achievable with a fraction of the parameters.

link

spflueger 56 days ago

Exactly. I also feel like being able to choose a model for the use case could be worth an idea. So instead of trying to squeeze all kinds of knowledge into a single model, even if it's moe, just focus models on use cases. I bet you only need double digit billion parameter models for that with same or even better performance

link

ako 56 days ago

Opus and Gpt are generic LLMs with knowledge on all sort of topics. For specific use cases you probably don't need all the parameters? Suppose you want to generate code with opencode, what part of the generic LLM is needed and what parts can be removed?

link

byzantinegene 56 days ago

we're already doing that, it's called distillation and how models like deepseek are trained.

link

Nimitz14 56 days ago

I think your own math leads to the conclusion the public apis are not serving models of that size. They couldn’t afford to

link

hedgehog 57 days ago

As far as I can tell Minimax M2.7 is better than anything available a year ago, but it runs on an ordinary PC. Will that continue? Not sure, but the trend has continued for the last two years and I don't know of any fundamental limits the models are approaching.

link

ricardobayes 56 days ago

I wish more people were more aware of this. I think so much of the current optimism is based on "it doesn't matter if companies are raising prices since I'm just going to run the model locally", doesn't fly.

link

otabdeveloper4 56 days ago

> A Opus 4.7/Gpt5.5 class model is 5 trillion parameters.

Or so they say.

If it's true then that just shows how far behind the cloud providers are lagging while wasting investor money.

(There's a huge amount of diminishing returns in increasing parameter counts and the intelligent AI company should be hard at work figuring out the optimal count without overfitting.)

link

rurban 56 days ago

Do that will only be possible with something like better 3D NAND flash memory, needs a new hardware. People are already trying to bring that the market. Contemplated taking a compiler position in such a company.

link

zozbot234 56 days ago

HBF is a non-starter, it runs way too hot compared to DRAM (which only pays for refresh at idle) for the same memory traffic. Only helps for extremely sparse MoE models - probably sparser than we're seeing today.

link

zozbot234 57 days ago

> A Opus 4.7/Gpt5.5 class model is 5 trillion parameters[1].

You could run it on a cluster of nodes that each do some mix of fetching parameters from disk and caching them in RAM. Use pipeline parallelism to minimize network bandwidth requirements given the huge size. Then time to first token may be a bit slow, but sustained inference should achieve enough throughput for a single user. That's a costly setup of course, but it doesn't cost $900k.

link

nl 57 days ago

> You could run it on a cluster of nodes

Not sure this is a MBP either.

link

bigyabai 56 days ago

Not even a cluster of Mac Pros could run a dense 5T parameter model with RDMA, to my knowledge.

link

zozbot234 56 days ago

SOTA models are reportedly MoE, not dense.

link

bigyabai 56 days ago

A 5T MoE model is still bottlenecked by streaming weights from SSD, in addition to compute bottlenecks during prefill and decode.

link

npunt 57 days ago

I did this calculation a bit ago and don't think frontier models are just a few MacBook Pro generations away. Yes numbers reliably go up in tech in general but in specific semiconductors & standards have long lead-times and published roadmaps, so we can have high confidence in what we're getting even in 3-4 years in terms of both transistor density and RAM speeds.

In mid-2028 we have N2E/N2P with around 15% greater transistor density than today's N3P, and by EOY2028 we'll likely have A14 with about 35-40% density improvement.

Meanwhile, we'll be on LPDDR6 by that point, which takes M-series Pros from 307GB/s -> ~400GB/s, and Max's from 614GB/s -> ~800GB/s.

Model improvements obviously will help out, but on the raw hardware front these aren't in the ballpark for frontier model numbers. An H100 has 3TB/s memory bandwidth, fwiw

link

zozbot234 57 days ago

What do you need 3 TB/s memory bandwidth for in a single user context? DeepSeek V4 pro (the latest near-SOTA model) has about 25 GB worth of active parameters (it uses a FP4 format for most layers) which gives 12 tok/s on a 307 GB/s platform as the current memory bandwidth bottleneck, maybe a bit less than that if you consider KV cache reads. That's not quite great but it's not terrible either for a pro quality model. Of course that totally ignores RAM limits which are the real issue at present: limited RAM forces you to fetch at least some fraction of params from storage, which while relatively fast is nowhere near as fast as RAM so your real tok/s are far lower (about 2 for a broadly similar model on a top-end M5 Pro laptop).

link

xorcist 57 days ago

That's not "math". That's a "wild guess", or baseless extrapolation at best.

link

polski-g 57 days ago

My son doubled in size in the first 8 months of his life. At age 12, he will be larger than the Moon.

link

riffraff 56 days ago

One of my favorite xkcd

https://xkcd.com/605/

link

CMay 56 days ago

So long as you don't require deep search grounding like massive web indexes or document stores which are hard to reproduce locally. You can do local agentic things that get close or even do better depending on search strategy, but theoretically a massive cloud service with huge data stores at hand should be able to produce better results.

In practice unless you're doing some kind of deep research thing with the cloud, it'll try to optimize mostly for time and get you a good enough answer rather than spending an hour or two. An hour of cloud searching with huge data stores is not equivalent to an hour of local agentic searching, presumably.

I think that problem will improve a little in the coming years as we kind of create optimized data curation, but the information world will keep growing so the advantage will likely remain with centralized services as long as they offer their complete potential rather than a fraction.

link

dualvariable 56 days ago

Also, all the cloud models don't have to be the best frontier models, and you don't need to focus on hitting the benchmark of shrinking Opus 4.7 down to a single MBP to make significant improvements. If you get it so that an Opus 4.7 benchmark-compatible model can run in $250k of datacenter capex (and associated reduced opex for power+cooling) that'd be a massive cost improvement that makes the cloud models cheaper. And for most consumers that'll probably be good enough. You don't need to run on a $5k laptop to make a big difference.

link

rc1 57 days ago

Show your working / explain your math?

link

parineum 57 days ago

https://xkcd.com/605/

link