| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by coffeecoders 284 days ago

I agree that it's kind of magical that you can download a ~10GB file and suddenly your laptop is running something that can summarize text, answer questions and even reason a bit.

The trick is balancing model size vs RAM: 12B–20B is about the upper limit for a 16GB machine without it choking.

What I find interesting is that these models don't actually hit Apple's Neural Engine, they run on the GPU via Metal. Core ML isn't great for custom runtimes and Apple hasn't given low-level developer access to the ANE afaik. And then there is memory bandwidth and dedicated SRAM issues. Hopefully Apple optimizes Core ML to map transformer workloads to the ANE.

11 comments

giancarlostoro 283 days ago

I feel like Apple needs a new CEO, I've felt this way for a long time. If I had been in charge of Apple I would have embraced local LLMs and built an inference engine that optimizes models that are designed for Nvidia, I also would have probably toyed around with the idea of selling server-grade Apple Silicon processors and opening up the GPU spec so people can build against it. Seems like Apple tries to play it too safe. While Tim Cook is good as a COO, he's still running Apple as a COO. They need a man of vision, not a COO at the helm.

aurareturn 283 days ago

I think if Cook had vision, he could have started something called Apple Enterprise and sold Apple Silicon as a server and made AI chips. I agree he’s too conservative and has no product vision. Great manager though.

seanmcdirmid 283 days ago

I was pleasantly surprised Apple Silicon came out at all. Someone has their eye on long term vision at Apple at least, they just didn't do this on a whim.

flutas 283 days ago

Or someone told Tim "we can save $XYZ per phone if we switch to custom designed silicon, and potentially expand it to Mac as well so we no longer have Intel overheating our Macbooks."

He was after-all more of an operations guy than a product guy before moving into the CEO role.

seanmcdirmid 283 days ago

The unified GPU and unified memory design was pretty important. They just didn’t go and replace intel, they replaced AMD/NVIDIA also. The GPUs in high end Apple silicon are even good enough for mid model inference, and unified memory makes it somewhat cost effective…that advantage probably wasn’t planned and comes from just a lot of good execution and smart R&D.

jychang 283 days ago

To be fair, Apple HATES Nvidia after the 8400M and 9400M debacle. They probably saw replacing Nvidia as a bigger benefit than replacing Intel.

mrexroad 283 days ago

They did have Xserve back in the day. As great as Apple silicon is for running local llms along with being a general-purpose computing device, it’s not clear that Apple silicon have enough of a differentiating advantage over a rack of nvidia gpus to make it worthwhile in enterprise…

Miraste 283 days ago

Strange to be saying this about Apple products, but its advantage is that it's way, way cheaper.

renmillar 283 days ago

Would probably be different if NVIDIA viewed it as competition for data center market share

nxobject 283 days ago

I think that would spread Apple’s chip team too thinly between competing priorities - and require them to do E2E stuff they’d never be interested in doing. What’s always happened, even during Jobs, is that Apple would do something nice and backend-y, and then not be able to keep it up as they’d pour resources into some consumer product. (See: WebObjects, Xserve, Mac OS Server.)

wpm 282 days ago

WebObjects was from NeXT. The Xserve saw regular updates until 2010. Mac OS Server was a GUI for a bunch of open source tools, and where it wasn’t (Workgroup Manager), they replaced with the MDM.

Apple had more money when they killed these than they did when these products were introduced. It’s not a resources issue. It’s a care issue. Same reason they fired the Mac OS Automation team. Same reason their documentation sucks hot diarrhea and all their good stuff is in the “documentation archive”. Penny pinching. “Shareholder value”. New blood destroying shit they didn’t understand.

alt227 283 days ago

Apple silicon does not compete well in multicore spaces. People seem to think that because it can run single core things really well on a laptop, it can do anything. Servers regularly have 100-200 cpu cores maxing out of rapid fire threads. This is not what Apple silicon excels at.

On top of that, it only performs so well on consumer devices because they control the hardware and OS and can tune both together. Creating server hardware would mean allowing linux to be installed on it, and would need to run equally well. Apple would never put the development time into linux kernel/drivers to make this happen.

wpm 282 days ago

Both Intel and AMD sell server CPUs with fewer than 100, hell, fewer than 32 cores.

There is of course a market for that. Not everyone needs a $4000 electric bill. Apple just can’t take the typical lions share of the profits in that market so they don’t bother.

packetlost 283 days ago

I know off the top of my head at least 3 places that would happily purchase a couple of XServers (one of which probably still has one) running MacOS Server. Linux isn't as hard of a requirement as you think.

giancarlostoro 282 days ago

Hell... I can think of loads of places running servers on WINDOWS (namely all of my employers, including F500 companies) I am not surprised that someone would run macOS as a server. At least MacOS is Unix based ;)

swiftcoder 283 days ago

> This is not what Apple silicon excels at

Not at the moment, no. I feel like the Apple silicon team probably would rise to that challenge though

otterley 283 days ago

> Apple silicon does not compete well in multicore spaces.

Can you elaborate on this? Maybe with some useful metrics?

brookst 283 days ago

Is expansion to all possible markets really a sign of product vision? Windows is in everything from ATMs to servers to cheap laptops, and I am not sure it’s a better product for it OR that Microsoft makes more money that way. Certainly the support burden for a huge number of long tail applications is huge.

And I suppose we’re giving credit to other people for Watch, AirPods, Vision Pro?

giancarlostoro 283 days ago

It doesn't just end with AI, but it seems the most blatant. At a bare minimum, he could assign someone to fulfill that vision for AI. Google has their own chips which they scale. Apple doesn't need to rebuild ChatGPT, but they could very much do what Microsoft does with Phi and provide Apple Silicon trained and optimized base models for all their users. It seems they are already doing something for XCode and Swift, but they're just barely scratching the surface.

I remember when the iPhone X became a thing, it was because consumers were extremely underwhelmed by Apple at the time. It's like they kicked it up less than a notch sadly.

If Tim Cook decided to be a little more of a visionary, I would say keep him. I would at least prefer he would delegate someone to do the visionary work, he will eventually need a successor.

jbverschoor 283 days ago

Local llm.. everybody is scared of privacy.. many people don’t want to buy subscriptions (still).

Just sell a proper HomePod with 64GB-128GB ram, which handles everything including your personal LLM, Time Machine if needed, back to Mac (Tailscale/zerotier)

+ they can compete efficiently with the other. Cloud providers.

brookst 283 days ago

It’s a mistake to generalize from the HN population.

Most people don’t care about privacy (see: success of Facebook and TikTok). Most people don’t care about subscriptions (see: cable TV, Netflix).

There may be a niche market for a local inference device that costs $1000 and has to be replaced every year or two during the early days of AI, but it’s not a market with decent ROI for Apple.

j45 282 days ago

An iPhone, Macbook, etc all cost in the $1000 range.

There was a post about the new iphone using A19, which includes a feature that makes local inference much easier.

If that makes it to M5, I think the local inference case continues to grow with each M processor.

bigyabai 283 days ago

> Just sell a proper HomePod with 64GB-128GB ram

The same Homepod that almost sold as poorly as Vision Pro despite a $349.99 MSRP? Apple charges $400 to upgrade an M4 to 64GB and a whopping $1,200 for the 128GB upgrade.

The consumer demand for a $800+ device like this is probably zilch, I can't imagine it's worth Apple's time to gussy up a nice UX or support it long-term. What you are describing is a Mac with extra steps, you could probably hack together a similar experience with Shortcuts if you had enough money and a use-case. An AI Homepod-server would only be efficient at wasting money.

redundantly 283 days ago

> The same Homepod that almost sold as poorly as Vision Pro despite a $349.99 MSRP?

The HomePod did poorly because competitor offerings with similar and better performing features were priced under $100. The difference in sound quality was not worth the >3x markup.

VagabundoP 283 days ago

Have a team pushing out opitmised open source models. Over time this thing could become the house AI. Basically Star Treks computer.

ako 283 days ago

They have local LLMs, apple foundation models: https://developer.apple.com/documentation/FoundationModels

andruby 283 days ago

Apple often wants to do it their way. Unfortunately, their foundation models are way behind even the open models.

_delirium 283 days ago

There are local LLM coding models that ship with XCode now too.

jbs789 283 days ago

Sounds like you’ve got a solid handle on things - go do it!

giancarlostoro 283 days ago

Give me a majority share in AAPL if that's what you want ;)

jbs789 277 days ago

There are a things we can all do in our own lives, not necessarily running Apple. I for one am grateful not to be in the public spotlight running Apple! Everyone has opinions. It’s what you do with them that counts.

elAhmo 283 days ago

I think shareholders are fine with Tim Cook as a CEO.

utyop22 283 days ago

I sometimes read posts on here and just laugh.

Its easy to sit in the armchair and say "just be a visionary bro" when they forget Tim worked under Steve for awhile before his death - he has some sense and understanding of what it takes to get a great product out of the door.

Nvidia is generating a lot of revenue, sure - but what is the downstream impact on its customers with the hardware? All they have right now is negative returns to show for their spending. Could this change? Maybe. Is it likely? Not in my view.

As it stands, Apple has made the absolute right choice in not wasting its cash and is demonstrating discipline. Which when all this LLM mania quietens, shareholders will respect.

nxobject 283 days ago

Arguably, it’s why investors go in for Apple in the first place: Apple’s revenue fundamentally comes from consumer spending, whose prospects are relatively well understood by the average investor.

(I think it’s why big shareholders don’t get angry that Apple doesn’t splash their cash around: their core value proposition is focused in a dizzying tech market; take it or leave it. It’s very Warren Buffett.)

moduspol 283 days ago

This. I wouldn’t exactly give them bonus points for the handling of Apple Intelligence, but beyond that, they’ve taken a much more measured and evidence-based approach to LLMs than the rest of big tech.

If it ends up that we are in a bubble and it pops, Apple may be among the least impacted in big tech.

ChrisMarshallNY 283 days ago

Friend of mine, used to work for Apple.

He told me that a popular Apple saying is "We're late to the party, but always best-dressed."

I understand this. I'm not sure their choice of outfit has always been the best, but they have had enough success to continue making money.

billbrown 283 days ago

Toyota did this with the EV mania until they lost their nerve and got rid of Toyoda as CEO. I hope Apple doesn't fall into the same trap. (I never thought Toyota would give in either.)

spease 283 days ago

Yes. And everyone is glossing over the benefit of unified memory for LLM applications. Apple may not have the models, but it has customer goodwill, a platform, and the logistical infrastructure to roll them out. It probably even has the cash to buy some AI companies outright; maybe not the big ones (for a reasonable amount, anyway) but small to midsize ones with domain-specific models that could be combined.

Not to mention the “default browser” leverage it has with with iPhones, iPods, and watches.

j45 282 days ago

Unified memory and examples like the M1 Ultra still being able to hold it's own years later might be one of the things that not all Mac users and non-mac users alike have experienced.

It's nice to see 16 Gb becoming the minimum, to me it should have been 32 for a long time.

woooooo 283 days ago

Not to mention, build a car with all that cash they have. Xiaomi makes awesome cars, Apple branded electric could scoop all the brand equity that Elon passed away.

saagarjha 283 days ago

One does not simply put a 5090 into an existing chip.

giancarlostoro 283 days ago

Not what I am suggesting. However, having trained a few different things on a modest M4 Pro chip (so not even their absolute most powerful chips mind you), and using it for local-first AI inference, I can see the value. A single server could serve an LLM for a small business and cost a lot less than running the same inference through a 5090 in terms of power usage.

I could also see universities giving this type of compute access to students for cheaper to work on more basic less resource intensive models.

saagarjha 282 days ago

I think a 5090 will handily beat it on power usage.

__loam 283 days ago

I'm glad Tim is the CEO instead of you.

jasonvorhe 282 days ago

Why? This is something that plays into all of Apple's supposed strengths: Privacy/no strict cloud dependency/on-device compute, hardware/software optimization while owning the stack and combine that with good UI/UX for a broad target audience without sacrificing too much for the power users. OP never said that local AI would be the only topic a new CEO should focus on.

brookst 283 days ago

Under Cook, Apple’s market cap has increased 10x, at a CAGR of 18%.

Do you really think that they need something different? As a shareholder would you bet on your vision of focusing on server parts?

bigyabai 283 days ago

Software-wise, it makes sense: Nvidia has the IP lead, industry buy-in and supports the OSes everyone wants to use.

Hardware-wise though, I actually agree - Apple has dropped the ball so hard here that it's dumbfounding. They're the only TSMC customer that could realistically ship a comparable volume of chips as Nvidia, even without really impacting their smartphone business. They have hardware designers who can design GPUs from scratch, write proprietary graphics APIs and fine-tune for power efficiency. The only organizational roadblock that I can see is the executive vision, which has been pretty wishy-washy on AI for a while now. Apple wants to build a CoreML silo in a world where better products exist everywhere, it's a dead-end approach that should have died back in 2018.

Contextually it's weird too, I've seen tons of people defend Cook's relationship with Trump as "his duty to shareholders" and the like. But whenever you mention crypto mining or AI datacenter markets, people act like Apple is above selling products that people want. Future MBAs will be taught about this hubris once the shape of the total damages come into view.

nxobject 283 days ago

> They have hardware designers who can design GPUs from scratch, write proprietary graphics APIs and fine-tune for power efficiency. The only organizational roadblock that I can see is the executive vision, which has been pretty wishy-washy on AI since for a while now.

The vision since Jobs has always been “build a great consumer product and own as much as you can while doing so”. That’s exactly how all of the design parameters of Ax/Mx series were determined and relentlessly optimized for - the fact that they have a highly competitive uarch was a salutary side-effect, but not a planned goal.

jen20 283 days ago

> But whenever you mention crypto mining or AI datacenter markets, people act like Apple is above selling products that people want.

People also want comfortable mattresses and high quality coffee machines. Should Apple make them too?

Apple not being in a particular industry is a perfectly valid choice, which is not remotely comparable to protecting their interests in the industries they are currently in. Selling datacenter-bound products is something Apple is not _remotely_ equipped for, and staffing up to do so at reasonable scale would not be a trivial task.

As for crypto mining... JFC.

bigyabai 283 days ago

Apple is perfectly well equipped to sell datacenter products. They've done it in the past, even supporting Nvidia's compute drivers along the way. If they have the staff to design consumer-facing and developer-facing experiences, why wouldn't they address the datacenter?

Money is money. 10 years ago people would have laughed at the notion of Nvidia abandoning the gaming market, now it's their most lucrative option. Apple can and should be looking at other avenues of profit while the App Store comes under scrutiny and the Mac market share refuses to budge. It should be especially urgent if unit margins are going down as suppliers leave China.

jen20 283 days ago

> They've done it in the past, even supporting Nvidia's compute drivers along the way. If they have the staff to design consumer-facing and developer-facing experiences, why wouldn't they address the datacenter?

They did a horrific job of it before. The staff to design consumer facing experiences are busy doing exactly that. The developer facing experiences are very lean. The bandwidth simply isn't there to do DC products. Nor is the supply chain. Nor is the service supply chain. Etc, etc.

saagarjha 283 days ago

Apple makes more profit on iPhones than Nvidia does on its entire datacenter business. Why would they want to enter a highly competitive market that they have no expertise in on a whim?

zozbot234 283 days ago

From reverse engineered information (in the context of Asahi Linux, which can have raw hardware access to the ANE) it seems that the M1/M2 Apple Neural Engine provides exclusively for statically scheduled MADD's of INT8 or FP16 values.[0] This wastes a lot of memory bandwidth on padding in the context of newer local models which generally are more heavily quantized.

(That is, when in-memory model values must be padded to FP16/INT8 this slashes your effective use of memory bandwidth, which is what determines token generation speed. GPU compute doesn't have that issue; one can simply de-quantize/pad the input in fast local registers to feed the matrix compute units, so memory bandwidth is used efficiently.)

The NPU/ANE is still potentially useful for lowering power use in the context of prompt pre-processing, which is limited by raw compute as opposed to the memory bandwidth bound of token generation. (Lower power usage in this context will save on battery and may help performance by avoiding power/thermal throttling, especially on passively-cooled laptops. So this is definitely worth going for.)

[0] Some historical information about bare-metal use of the ANE is available from the Whisper.cpp pull req: https://github.com/ggml-org/whisper.cpp/pull/1021 Even older information at: https://github.com/eiln/ane/tree/33a61249d773f8f50c02ab0b9fe... .

More extensive information at https://github.com/tinygrad/tinygrad/tree/master/extra/accel... (from the Tinygrad folks) seems to basically confirm the above.

(The jury is still out for M3/M4 which currently have no Asahi support - thus, no current prospects for driving the ANE bare-metal. Note however that the M3/Pro/Max ANE reported performance numbers are quite close to the M2 version, so there may not be a real improvement there either. M3 Ultra and especially the M4 series may be a different story.)

slacka 284 days ago

I too found that interesting that Apple's Neural Engine doesn't work with local LLMs. Seems like Apple, AMD, and Intel are missing the AI boat by not properly supporting their NPUs in llama.cpp. Any thoughts on why this is?

bigyabai 284 days ago

Most NPUs are almost universally too weak to use for serious LLM inference. Most of the time you get better performance-per-watt out of GPU compute shaders, the majority of NPUs are dark silicon.

Keep in mind - Nvidia has no NPU hardware because that functionality is baked-into their GPU architecture. AMD, Apple and Intel are all in this awkward NPU boat because they wanted to avoid competition with Nvidia and continue shipping simple raster designs.

aurareturn 283 days ago

Apple is in this NPU boat because they are optimized for mobile first.

Nvidia does not optimize for mobile first.

AMD and Intel were forced by Microsoft to add NPUs in order to sell “AI PCs”. Turns out the kind of AI that people want to run locally can’t run on an NPU. It’s too weak like you said.

AMD and Intel both have matmul acceleration directly in their GPUs. Only Apple does not.

bigyabai 283 days ago

Nvidia's approach works just fine on mobile. Devices like the Switch have complex GPGPU pipelines and don't compromise whatsoever on power efficiency.

Nonetheless, Apple's architecture on mobile doesn't have to define how they approach laptops, destops and datacenters. If the mobile-first approach is limiting their addressable market, then maybe Tim's obsessing over the wrong audience?

aurareturn 283 days ago

MacBooks benefit from mobile optimization. Apple just needs to add matmul hardware acceleration into their GPUs.

numpad0 284 days ago

Perhaps due to sizes? AI/NN models before LLM were magnitudes smaller, as evident in effectively all LLMs carrying "Large" in its name regardless of relative size differences.

Someone 284 days ago

I guess that hardware doesn’t make things faster (¿yet?). If so I guess they would have mentioned it in https://machinelearning.apple.com/research/core-ml-on-device.... That is updated for Sequoia and says

“This technical post details how to optimize and deploy an LLM to Apple silicon, achieving the performance required for real time use cases. In this example we use Llama-3.1-8B-Instruct, a popular mid-size LLM, and we show how using Apple’s Core ML framework and the optimizations described here, this model can be run locally on a Mac with M1 Max with about ~33 tokens/s decoding speed. While this post focuses on a particular Llama model, the principles outlined here apply generally to other transformer-based LLMs of different sizes.”

cma 283 days ago

If it uses a lot less power it could still be a win for some use cases, like while on battery you might still want to run transformer based speech to text, RTX voice-like microphone denoising, image generation/infill in photo editing programs. In some use cases like RTX-voice like stuff during multiplayer gaming, you might want the GPU free to run the game even if it still suffers some memory bandwidth impact from having it running.

GeekyBear 284 days ago

There is no NPU "standard".

Llama.cpp would have to target every hardware vendor's NPU individually and those NPUs tend to have breaking changes when newer generations of hardware are released.

Even Nvidia GPUs often have breaking changes moving from one generation to the next.

montebicyclelo 284 days ago

I think OP is suggesting that Apple / AMD / Intel do the work of integrating their NPUs into popular libraries like `llama.cpp`. Which might make sense. My impression is that by the time the vendors support a certain model with their NPUs the model is too old and nobody cares anyway. Whereas llama.cpp keeps up with the latest and greatest.

svachalek 283 days ago

I think I saw something that got Ollama to run models on it? But it only works with tiny models. Seems like the neural engine is extremely power efficient but not fast enough to do LLMs with billions of parameters.

reddit_clone 282 days ago

I am running Ollama with 'SimonPu/Qwen3-Coder:30B-Instruct_Q4_K_XL' on a M4 pro MBP with 48 GB of memory.

From Emacs/gptel, it seems pretty fast.

I have never used the proper hosted LLMS, so I don't have a direct comparison. But the above LLM answered coding questions in a handful of seconds.

The cost of memory (and disk) upgrades in apple machines is exorbitant.

jondwillis 283 days ago

https://github.com/Anemll/Anemll

GeekyBear 284 days ago

> Hopefully Apple optimizes Core ML to map transformer workloads to the ANE.

If you want to convert models to run on the ANE there are tools provided:

> Convert models from TensorFlow, PyTorch, and other libraries to Core ML.

https://apple.github.io/coremltools/docs-guides/index.html

ls-a 284 days ago

I thought Apple MLX can do that if you convert your model using it https://mlx-framework.org/

woadwarrior01 283 days ago

MLX does not support the ANE.

https://github.com/ml-explore/mlx/issues/18

elpakal 283 days ago

Yes it does.

That’s just an issue with stale and incorrect information. Here are the docs https://opensource.apple.com/projects/mlx/

woadwarrior01 283 days ago

No, it categorically doesn't. Not just that, it's CPU support is quite lacking (fp32 only). Currently, there are two ways to support the ANE: CoreML and MPSGraph.

y1n0 283 days ago

Nothing in that documentation says anything about the Apple Neural Engine. MLX runs on the GPU.

jychang 283 days ago

None of that uses the ANE.

GeekyBear 284 days ago

It does indeed, and is more modern than Core ML.

coffeecoders 284 days ago

It is less about conversion and more about extending ANE support for transformer-style models or giving developers more control.

The issue is in targeting specific hardware blocks. When you convert with coremltools, Core ML takes over and doesn't provide fine-grained control - run on GPU, CPU or ANE. Also, ANE isn't really designed with transformers in mind, so most LLM inference defaults to GPU.

aurareturn 283 days ago

Neural Engine is optimized for power efficiency, not performance.

Look for Apple to add matmul acceleration into the GPU instead. Thats how to truly speed up local LLMs.

ai-christianson 283 days ago

I can run GLM 4.5 Air and gpt-oss-120b both very reasonably. GPT OSS has particularly good latency.

I'm on a 128GB M4 macbook. This is "powerful" today, but it will be old news in a few years.

These models are just about getting as good as the frontier models.

ru552 283 days ago

You're better served using Apple's MLX if you want to run models locally.

daemonologist 283 days ago

ONNX Runtime purports to support CoreML: https://onnxruntime.ai/docs/execution-providers/CoreML-Execu... , which gives a decent amount of compatibility for inference. I have no idea to what extent workloads actually end up on the ANE though.

(Unfortunately ONNX doesn't support Vulkan, which limits it on other platforms. It's always something...)

wslh 284 days ago

I find surprising that you can also do that from the browser (e.g. WebLLM). I imagine that in the near future we will run these engines locally for many use cases, instead of via APIs.

witnessme 283 days ago

Don't try 12-20B on 16GB. You should stick with 4-8B instead. You'll get way too slow tps and marginal perf improvements going higher on a 16GB machine.

zackmorris 283 days ago

Don't get me started. Many new computers come with an NPU of some kind, which is superfluous to a GPU.

But what's really going on is that we never got the highly multicore and distributed computers that could have started going mainstream in the 1980s, and certainly by the late 1990s when high-speed internet hit. So single-threaded performance is about the same now as 20 years ago. Meanwhile video cards have gotten exponentially more powerful and affordable, but without the virtual memory and virtualization capabilities of CPUs, so we're seeing ridiculous artificial limitations like not being able to run certain LLMs because the hardware "isn't powerful enough", rather than just having a slower experience or borrowing the PC in the next room for more computing power.

To go to the incredible lengths that Apple went to in designing the M1, not just wrt hardware but in adding yet another layer of software emulation since the 68000 days, without actually bringing multicore with local memories to the level that today's VLSI design rules could allow, is laughable for me. If it wasn't so tragic.

It's hard for me to live and work in a tech status quo so far removed from what I had envisioned growing up. We're practically at AGI, but also mired in ensh@ttification. Reflected in politics too. We'll have the first trillionaire before we solve world hunger, and I'm bracing for Skynet/Ultron before we have C3P0/JARVIS.

jondwillis 283 days ago

https://github.com/Anemll/Anemll