| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by JumpCrisscross 284 days ago
	I don't think we're anywhere close to running cutting-edge LLMs on our phones or laptops. What may be around the corner is running great models on a box at home. The AI lives at home. Your thin client talks to it, maybe runs a smaller AI on device to balance latency and quality. (This would be a natural extension for Apple to go into with its Mac Pro line. $10 to 20k for a home LLM device isn't ridiculous.)

6 comments

simonw 284 days ago

Right now you can run some of the best available open weight models on a 512GB Mac Studio, which retails for around $10,000. Here's Qwen3-Coder-480B-A35B-Instruct running at 24 tokens/second at 4bit: https://twitter.com/awnihannun/status/1947771502058672219 and Deep Seek V3 0324 in 4-bit at 20 toks/sec https://twitter.com/awnihannun/status/1904177084609827054

You can also string two 512GB Mac Studios together using MLX to load even larger models - here's 671B 8-bit DeepSeek R1 doing that: https://twitter.com/alexocheema/status/1899735281781411907

link

zargon 283 days ago

What these tweets about Apple silicon never show you: waiting 20+ minutes for it to ingest 32k context tokens. (Probably a lot longer for these big models.)

link

logicprog 283 days ago

Yeah, I bought a used Mac Studio (an M1, to be fair, but still a Max and things haven't changed since) hoping to be able to run a decent LLM on it, and was sorely disappointed thanks to the prompt processing speed especially.

link

alt227 283 days ago

No offense to you personally, but I find it very funny when people hear marketing copy for a product and think it can do anything they said it can.

Apple silicon is still just a single consumer grade chip. It might be able to run certain end user software well, but it cannot replace a server rack of GPUs.

link

zargon 283 days ago

I don’t think this is a fair take in this particular situation. My comment is in response to Simon Willison, who has a very popular blog in the LLM space. This isn’t company marketing copy; it’s trusted third parties spreading this misleading information.

link

alt227 277 days ago

Fair enough, apologies for assuming.

link

brokencode 284 days ago

Not sure about the Mac Pro, since you pay a lot for the big fancy case. The Studio seems more sensible.

And of course Nvidia and AMD are coming out with options for massive amounts of high bandwidth GPU memory in desktop form factors.

I like the idea of having basically a local LLM server that your laptop or other devices can connect to. Then your laptop doesn’t have to burn its battery on LLM work and it’s still local.

link

JumpCrisscross 284 days ago

> Not sure about the Mac Pro, since you pay a lot for the big fancy case. The Studio seems more sensible

Oh wow, a maxed out Studio could run a 600B parameter model entirely in memory. Not bad for $12k.

There may be a business in creating the software that links that box to an app on your phone.

link

simonw 283 days ago

I have been using a Tailscale VPN to make LM Studio and Ollama running on my Mac available to my iPhone when I leave the house.

link

brokencode 284 days ago

Perhaps said software could even form an end to end encrypted tunnel from your phone to your local LLM server anywhere over the internet via a simple server intermediary.

The amount of data transferred is tiny and the latency costs are typically going to be dominated by the LLM inference anyway. Not much advantage to doing LAN only except that you don’t need a server.

Though the amount of people who care enough to buy a $3k - $10k server and set this up compared to just using ChatGPT is probably very small.

link

JumpCrisscross 284 days ago

> amount of people who care enough to buy a $3k - $10k server and set this up compared to just using ChatGPT is probably very small

So I maxed that out, and it’s with Apple’s margins. I suspect you could do it for $5k.

I’d also note that for heavy users of ChatGPT, the difference in energy costs for a home setup and the price for ChatGPT tokens may make this financially compelling for heavy users.

link

brokencode 283 days ago

True, it may be profitable for pro users. At $200 a month for ChatGPT Pro, it may only take a few years to recoup the initial costs. Not sure about energy costs though.

And of course you’d be getting a worse model, since no open source model currently is as good as the best proprietary ones.

Though that gap should narrow as the open models improve and the proprietary ones seemingly plateau.

link

dghlsakjg 284 days ago

That software is an HTTP request, no?

Any number of AI apps allow you to specify a custom endpoint. As long as your AI server accepts connections to the internet, you're gravy.

link

JumpCrisscross 283 days ago

> That software is an HTTP request, no?

You and I could write it. Most folks couldn’t. If AI plateaus, this would be a good hill to have occupied.

link

dghlsakjg 283 days ago

My point is, what is there to build?

The person that is willing to buy that appliance is likely heavily overlapped with the person that is more than capable of pointing one of the dozens of existing apps at a custom domain.

Everyone else will continue to just use app based subscriptions.

Streaming platforms have plateaued (at best), but self hosted media appliances are still vanishingly rare.

Why would AI buck the trend that every other computing service has followed?

link

itsn0tm3 283 days ago

You don’t tell your media player company secrets ;)

I think there is a market here, solely based on actual data privacy. Not sure how big it is but I can see quite some companies have use for it.

link

JumpCrisscross 283 days ago

> what is there to build?

Integrated solution. You buy the box. You download the app. It works like the ChatGPT app, except it's tunneling to the box you have at home which has been preconfigured to work with the app. Maybe you have a subscription to keep everything up to date. Maybe you have an open-source model 'store'.

link

theshrike79 283 days ago

It's really easy to whip up a simple box that runs local LLM for a whole home.

Marketing it though? Not doable.

Apple is pretty much the only company I see attempting this with some kind of AppleTV Pro.

link

data-ottawa 284 days ago

This is what I’m doing with my amd 395+.

I’m running docker containers with different apps and it works well enough for a lot of my use cases.

I mostly use Qwen Code and GPT OSS 120b right now.

When the next generation of this tech comes through I will probably upgrade despite the price, the value is worth it to me.

link

milgrum 283 days ago

How many TPS do you get running GPT OSS 120b on the 395+? Considering a Framework desktop for a similar use case, but I’ve been reading mixed things about performance (specifically with regards to memory bandwidth, but I’m not sure if that’s really the underlying issue)

link

data-ottawa 283 days ago

30-40 at 64k context, but it's a mixture of experts model.

A 70b dense model is slower

Qwen coder 30b Q4 runs 40+.

link

bigyabai 284 days ago

> $10 to 20k for a home LLM device isn't ridiculous.

At that point you are almost paying more than the datacenter does for inference hardware.

link

JumpCrisscross 284 days ago

> At that point you are almost paying more than the datacenter does for inference hardware

Of course. You and I don't have their economies of scale.

link

bigyabai 284 days ago

Then please excuse me for calling your one-man $10,000 inference device ridiculous.

link

JumpCrisscross 284 days ago

> please excuse me for calling your one-man $10,000 inference device ridiculous

It’s about the real price of early microcomputers.

Until the frontier stabilizes, this will be the cost of competitive local inference. Not pretending what we can run on a laptop will compete with a data centre.

link

simonw 284 days ago

Plenty of hobbies are significantly more expensive than that.

link

bigyabai 283 days ago

The rallying cry of money-wasters the world over. "At least it's not avgas!"

link

seanmcdirmid 283 days ago

Some people lose lots of money on boats, some people buy a fancy computer instead and lose less, although still a lot of, money.

link

brookst 283 days ago

How is it not impressive to be able to do something at quantity 1 for roughly the same price megacorps get at quantity 100,000?

Try building a F1 car at home. I guarantee your unit cost will be several orders of magnitude higher than the companies who make several a year.

link

rpdillon 283 days ago

I mean, not really? Yeah, I pay to go to the movies and sit in a theater that they let me buy a ticket for, but that doesn't mean people that want to set up a nice home theater are ridiculous, they just care more about controlling and customizing their experience.

link

grim_io 283 days ago

Some would argue that the home theater is a superior experience to a crowded, far away movie theater where the person's head in front of you takes up a quarter of the screen.

The same can't be said for local inference. It is always interior in experience and quality.

A reasonable home theater pays for itself over time if you watch a lot of movies. Plus you get to watch shows as well, which the limited theater program doesn't allow.

I can buy over 8 years of the Claude max $100 plan for the price of the 512GB M3 Ultra. And I can't imagine the M3 being great at this after 5 years of hardware advancement.

link

rpdillon 278 days ago

> The same can't be said for local inference. It is always interior in experience and quality.

Not really. I do it because it offers me more control. That's higher quality in my book.

link

vonneumannstan 283 days ago

Almost? Isn't a single h100 like 30k which is the bare minimum to run a big model?

link

ben_w 283 days ago

> $10 to 20k for a home LLM device isn't ridiculous.

That price is ridiculous for most people. Silicon Valley payscales can afford that much, but see how few Apple Vision Pros got sold for far less.

link

vonneumannstan 283 days ago

Doesnt gpt-oss-120b perform better across the board at a fraction of the memory? Just specced a $4k mac studio that can easily run that at 128 gb memory.

link