Hacker News new | ask | show | jobs
by Akuehne 37 days ago
I feel like lots of people here are just commenting on the headline.

This isn't about the local models you're running on your old gaming rig, or the tesla p40 rig you build for local llm's.

This is about code leveraging the local resources where the code is running for it's AI needs. Rather than making an API call to an external AI service, the code leverages the AI capabilities built into the hardware it runs on. With modern Apple, Intel, and AMD silicon all shipping dedicated AI acceleration, this is the where IMO the focus should be heading.

How many Flops or whatever can your phone do? I bet it's enough to paint the walls of your living room, or draw a pretty good pelican on a bike.

6 comments

And this is exactly what the LLM provider industry is fighting tooth-and-nail. It’s not only because it doesn’t directly contribute to their bottom line, it also directly opposes the idea that LLMs are going to replace entire workers rather than enhance the abilities of individual workers. What we’re headed towards would have been a killer product and probably still shifted a bunch of capital to the bazillionaires had these companies set more realistic goals rather than banking that they’d be the ones that won the war that “changed everythingTM”.
As long as Apple and Google put reasonable AI capabilities on device, then software engineers will use those capabilities when it makes sense (the article gives lots of good examples of capabilities that make sense to run locally). As the author notes, it's cheaper and more reliable to run these things locally.

That also doesn't preclude LLM services from being massively successful, they'll just have to justify the pricing and complexity that comes with their adoption, just like any other product.

> That also doesn't preclude LLM services from being massively successful, they'll just have to justify the pricing and complexity that comes with their adoption, just like any other product.

What is completely different from every other product is how much they’re spending, and how much they’re obligating themselves to spend going forward. I think there’s a very good chance that the existing providers could be miles underwater coming out of this. Even if the business is not the everything to everybody that they’re banking on it being, they still owe all of that money back to the people they borrowed it from, and they will be a lot less likely to float them cash to get them back to a normal operating mode if they burned the last ocean of cash promising the universe and winding up with “oh yeah, that’s pretty useful sometimes.”

Yeah that's a good callout for sure, the spending here is nuts so agree that it's not "just another business that has to price itself right to be competitive".

I guess if the time horizons is long, like 20 years, then maybe the spending, as it begins to amortize, gets more in line?

I was thinking that a comparison could be to cloud providers, each of which had to spend a lot of money to build out datacenter before making money. Difference there is AWS proved the product first, so when Microsoft and Google came along, they knew it would work and be profitable. With AI, nobody has proven it will work and be profitable, they're all competing for that at the same time which is a potentially dangerous mix for the reasons you cited.

The only way that this even vaguely works, best I can tell, would be on that decade-or-two timeline, but therein lies the problem: all this money getting pumped into data centers right now is going to produce data centers that are running old, inefficient, slow GPUs by 5-years-from-now standards. And GPUs are by far the most expensive part of these data centers… having the buildings is barely an asset. We’re investing all the money in right now’s technology in one of the fastest moving hardware segments and for some inexplicable reason, think that will lead to a sustainable advantage. What’s to stop someone 5 years from now, waiting for the dust to settle, then spending way less money for more compute and just mopping the floor with everybody in this sector… and that’s (unreasonably, IMO) assuming that local applications won’t become good enough to take too large a bite from their business before that.

And look at the difference in spending between their building out general-purpose-computing cloud data centers that even then, had potential use cases if the business failed. What are they going to do… start a massive, extremely expensive pre-rendered online gaming service? Only render Disney movies?

I dunno. None of this makes sense to me.

These datacenters are already running old, inefficient, slow GPUs from five years ago in addition to newly released cards, because anything newer than that is extremely bottlenecked and they need all the compute they can get. Why should it be any different in five years' time? Even nVidia is rumored to be about to bring back the RTX 3060 which is an Ampere architecture card that got released around 2021. It's just fine.
> they'll just have to justify the pricing

like by selling it at a loss to build dependencies and then jacking the price up year after year by whatever amount is just below the cost of removing the dependency

In an ideal world they will. In reality most will use online AI, because it's path of least resistance and more familiar.
> ...it also directly opposes the idea that LLMs are going to replace entire workers rather than enhance the abilities of individual workers.

Which also, as I feel the need to remind everyone every time it comes up, has not yet once been actually shown to be a workable strategy. For any worker in any industry.

And to be clear, I'm talking about a worker, sitting in a chair, replaced with an agent, sitting in... a server, I guess, where nothing else about the org has to be changed. That's what's being advertised and sold, and it has never to my knowledge actually happened.

mainframe industry vs personal computers.

If their product is "access to a big model running on a really big computer" (if we can count 'multiple data-centers' as a single enormous distributed computer), then the product "small, accessible device that everyone has" risks killing their cash cow.

Ironically enough, the first company to really focus on "an LLM in every phone" will have a good shot at actually being the ones that "changed everythingTM", in the way Microsoft changed the world from IBM mainframes to PCs, or Apple made smartphones a thing.

As an aside, the mainframe industry was profitable for decades before PCs took over. It’s not like they spent a zillion dollars ramping up at the same time.
The mainframe industry IS profitable.
Actually you can do way more things than that. We have optimized it to process 2TB of high def videos on a M5 MBP in under 24 hours, including everything such as speech understanding, face recog, LLM and VLM. Super fun.
If Steve Jobs was alive Apple would have already demoed this as a new line of Macs with open weight models pre-installed with hooks into all of their existing content creation software.

And he would have the audience believing all the demos were running through third party AI providers, until at the last moment explaining “actually all of that ran on device with no connection to any external services.”

hhh, "one more thing"
Is this project public or have you written about it anywhere?
Yeah, we've recently made it public. You can check it out here: https://clipto.com

Be aware that it is still a beast that sucks in a lot of memory.

Oh, one more thing ;) remember to keep your Mac plugged in...

> draw a pretty good pelican on a bike.

You mean the famously hard task? The one picked because it stretches frontier models to their limits?

It was a famously hard task. It was an ingenious idea for an unexpected task that falls outside of the bounds of predictable normal input but is still readily comprehended by the public.

Unfortunately, as soon as it's a famously hard task trainers know they need to succeed at it and it loses a lot of the power to detect correctness.

In fairness, that isn't due to a lack of compute.
https://simonwillison.net/2026/Apr/22/qwen36-27b/

Maybe this is an example of training overfit. But it won't be too long before local models chew through the "famously hard tasks". Except possibly ARC-AGI. That's one benchmark that is still developing with capabilities. And every time a new ARC-AGI benchmark is released it make the SOTA LLMs look pathetic. Because there is very little understanding or transferability with LLMs. But in terms of benchmark-able micro tasks, the local LLMs are improving.

I just did something exactly like this. I have a self-hosted personal dashboard and one of the APIs I'm reading gives slightly too verbose of an output. So I added a feature to summarize the text using Qwen 3.5 2B which happily runs on a CPU. I've never clocked the tokens per second because I only generate like 100 tokens an hour in a very narrow domain of knowledge and speed isn't critical.
A phone makes a very crappy AI inference rig. It's battery powered and can't even really run at 100% utilization on an ongoing basis due to how challenging the thermals are.
at the moment yes. The one possible silver lining with all of the current hardware crunch is that it _should_ force some hardware advancements. The last couple years hardware has been kinda boring. My m1max is still zippy as all hell and doesn't really need to be upgraded, unless I am committing to local AI inference.
I kinda assume phones are going to be battery powered for the foreseeable future. "Gaming" phones with better cooling do exist, but they are a tiny niche. Most local AI users will want to serve their inference needs through a very different kind of system.
Yes, but the battery tech itself is improving. We're already seeing new phones approach 8000 mAH internal batteries, which is large enough that you can splurge on compute and still have some left over at the end of the day.
> it _should_ force some hardware advancements

I'm very curious what kind of hardware advancements you're imagining. Because we're already kind of near a physical wall regarding heat dissipation on phones.

I mean hey, maybe foundational physics will surprise the world with a radical breakthrough that disappears heat into a black hole or something, but I sure wouldn't hold my breath

More likely it would force software advancements. Current models are horribly inefficient.
eGPU cradles, presumably, for people with intense local model execution requirements until it can be made to work in the device? This is exactly like the POS dongles Square had until tap to pay was more widespread?
Launch everyone's phones into space.
Heat dissipation is even harder in space
water cooled pant pockets
The iPant? Or a Samsung WCPP1?
I was writing just about this last week for fun: AI + hardware team-up to build localized AI with specialized functions to your organization. Ex. Adoble Studio AI in an on premise Box, made by Apple and powered by something like Cohere with privacy:

https://www.notion.so/adeelkhamisa/Cohere-s-next-steps-to-be...