Hacker News new | ask | show | jobs
by spoaceman7777 64 days ago
I'm somewhat confused as to why this is on the front page. It doesn't go into any real detail, and the advice it gives is... not good. You should definitely not be quantizing your own gguf's using an old method like that hf script. There are lots of ways to run LLMs via podman (some even officially recommended by the project!). The chip has been out for almost a year now, and its most notable (and relevant-to-AI) feature is not mentioned in this article (it's the only x86_64 chip below workstation/server grade that has quad-channel RAM-- and inference is generally RAM constrained). I'm also quite puzzled about this bit about running pytorch via uv.

Anyway. I wouldn't recommend following the steps posted in there. Poke around google, or ask your friendly neighborhood LLM for some advice on how to set up your Strix Halo laptop/desktop for the tasks described. A good resource to start with would probably be the unsloth page for whichever model you are trying to run. (There are a few quantization groups that are competing for top-place with gguf's, and unsloth is regularly at the top-- with incredible documentation on inference, training, etc.)

Anyway, sorry to be harsh. I understand that this is just a blog for jotting down stuff you're doing, which is a great thing to do. I'm mostly just commenting on the fact that this is on the front page of hn for some reason.

3 comments

Thanks for writing this comment, I think seeing someone’s “first impressions” and then seeing someone else’s response to those thoughts is more interesting and feels more connected socially than just reading a “correct” guide or similar especially when it’s something I’m curious about but wouldn’t necessarily be motivated enough to actually try out myself.
Agreed. Been running a Strix Halo box since mid-2025. Lemonade builds of llama.cpp with Unsloth or Bartowski quants have proven to be excellent.
Quad-channel RAM is common on consumer desktops. Strix Halo has *8* channels, and also very fast RAM (soldered RAM can be faster than dimms because the traces are shorter.)
Quad channel memory is not common on consumer desktops, it's a strictly HEDT and above feature. The vast majority of consumer desktops have 2 channels or fewer.
One should no longer use the word "channel" because the width of a channel differs between various kinds of memories, even among those that can be used with the same CPU (e.g. between DDR and LPDDR or between DDR4 and DDR5).

For instance, now the majority of desktops with DDR5 have 4 channels, not 2 channels, but the channels are narrower, so the width of the memory interface is the same as before.

To avoid ambiguities, one should always write the width of the memory interface.

Most desktop computers and laptop computers have 128-bit memory interfaces.

The cheapest desktop computers and laptop computers, e.g. those with Intel Alder Lake N/Twin Lake CPUs, and also many smartphones and Arm-based SBCs, have 64-bit memory interfaces.

Cheaper smartphones and Arm-based SBCs have 32-bit memory interfaces.

Strix Halo and many older workstations and many cheaper servers have 256-bit memory interfaces.

High-end servers and workstations have 768-bit or 512-bit memory interfaces.

It is expected that future high-end servers will have 1024-bit memory interfaces per socket.

GPUs with private memory have usually memory interfaces between 192-bit and 1024-bit, but newer consumer GPUs have usually narrower memory interfaces than older consumer GPUs, to reduce cost. The narrower memory interface is compensated by faster memories, so the available bandwidth in consumer GPUs has been increased much slower than the increase in GDDR memory speed would have allowed.

>now the majority of desktops with DDR5 have 4 channels, not 2 channels

Source? I just looked up two random X870E boards from Gigabyte and both are dual channel.

>To avoid ambiguities, one should always write the width of the memory interface.

They're incomparable quantities. More channels support more parallel operations, while a wider bus at a constant frequency supports higher throughput.

The bus width is not even that useful of a metric. It's more useful to talk about bits per second, which is the product of bus width and frequency.

Sadly motherboards, tech journalist, and many other places confuse the difference between a dimm and channel. The trick is the DDR4 generation they were the same, 64 bits wide. However a standard DDR5 dimm is not 1x64 bit, it's actually 2x32 bit. Thus 2 DDR5 dimms = 4 channels.

For some workloads the extra channels help, despite having the same bandwidth. This is one of the reasons that it's possible for a DDR5 system to be slightly faster than a DDR4 system, even if the memory runs at the same speed.

>However a standard DDR5 dimm is not 1x64 bit, it's actually 2x32 bit. Thus 2 DDR5 dimms = 4 channels.

Uh, surely that depends on how the motherboard is wired. Just because each DIMM has half the pins on one channel and the other half on another, doesn't mean 2 DIMM = 4 channels. It could just be that the top pins over all the DIMMs are on one channel and the bottom ones are on another.

> Quad-channel RAM is common on consumer desktops

Yes, but tablets, laptops, and normal (non-HEDT) desktops have 4 channels, 4x32 bit = 128 bit wide. Modern memory with DDR5 allows two 32 bit channels on a 64 bit dimm. The previous gen DDR4 would allow 1 64 bit channel on a 64 bit dimm.

So strix halo (on laptops, tablets, and desktops) allows for a 256 bit wide memory system, providing twice the memory bandwidth of any ryzen or intel i3/i5/i7/i9. The Apple pro (256 bit), max (512 bit), and ultra (1024 bit) lines of apple silicon have greater than 128 bit wide memory systems. On the AMD size it's just the Threadripper (256 bit) and Threadripper pro (512 bit), but those are typically in expensive workstations that are physically large, expensive, and need substantial cooling.

So the HALO is pretty unique (outside of Apple) for providing twice the memory bandwidth of anything else that fits in the tablet, laptop, or small desktop category.

4 DIMMS =/= 4 channels
I knew that, but I still thought most desktops with 4 dimm slots supported quad-channel memory. I guess I was wrong.