| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by 999900000999 214 days ago

With these new developments, are there any implications for getting LLMs running well on consumer AMD chips ?

For example, the following laptop which I'm thinking of picking up, has both a strong AMD CPU/IGPU and a RTX 5080. Could we see the AMD side competing with the RTX?

I know a dedicated gpu will always be faster though.

>HP OMEN MAX 16-ak0003nr 16" Gaming Laptop Computer - Shadow Black Aluminum AMD Ryzen AI 9 HX 375 (2.0GHz) Processor; NVIDIA GeForce RTX 5080 16GB GDDR7; 32GB DDR5-5600 RAM; 1TB Solid State Drive

2 comments

ehnto 214 days ago

I run Qwen3 Coder 30b through Ollama on an RTX7900XTX. It works great, I suspect some load gets passed to the 32gb system memory and Ryzen 7 CPU.

It's not quite as fast as like Sonnet 4 from an API, but it's really not that bad.

It's really great for quick questions so I don't have to google stuff, and it's probably Sonnet4 level of competency at achieving coding tasks.

No API served model has been fast enough to remove the urge to do something else while waiting for bigger tasks, so the UX is more or less the same in that regard.

Opencode + ollama + Qwen3 Coder has been a very reasonable alternative to ClaudeCode with Sonnet4.

That is amazing for something running locally.

It is possible that if you actually need AI to be doing all your coding, that you're going to feel differently about the setup. But as a small assistant it's great.

link

christkv 214 days ago

That's great I have been eyeing a Strix Halo and was wondering how well smaller models are doing. This is great news from the perspective of running local agents.

link

JonChesterfield 214 days ago

I got one of those running whisper yesterday, hopeful the bigger llms will run shortly. You'd need rocm 7 which seems to be much better than 6.4 was.

link

christkv 213 days ago

Is the performance decent? I'm looking at using it with 30b coding models with a local agent framework like goose to see if we can do this locally as developers instead of risking leaking code to the big models.

link

JonChesterfield 212 days ago

The chip in general is fast, it builds llvm in ~12m or so. Whisper on it is at least real time but I only ran the stream binary before sending the box away to SC25. I'm expecting it to need some work to exploit the zero copy the APU permits. So it probably will be fast but isn't just yet, at least on my toolchain.

link

electroglyph 214 days ago

not the best model to use as a showcase, it's blistering fast on anything that isn't a toaster

link

ehnto 214 days ago

Great! That's what I am pointing out, it's a 30b param model that fits into an AMD card and runs great. That's what we want.

link

fulafel 214 days ago

You might think that a dGPU is always faster but the limited memory capacity bites you there (unless you go to datacenter dGPUs that cost tens of thousnds). Look at eg https://www.ywian.com/blog/amd-ryzen-ai-max-plus-395-native-... or the various high end Mac results.

link

999900000999 213 days ago

So I want this Thinkpad.

https://www.lenovo.com/us/en/p/laptops/thinkpad/thinkpadp/th...?

AMD Ryzen™ AI 9 HX PRO 370 Processor (2.00 GHz up to 5.10 GHz) Operating System Windows 11 Pro 64 Graphic Card Integrated AMD Radeon™ 890M Memory 64 GB DDR5-5600MT/s (SODIMM)(2 x 32 GB)

But I also seriously want to run LLMs. My hunch is a gaming laptop is the best way to do this on the go without spending 5000$ for a Thinkpad with a high end graphics card.

link