Hacker News new | ask | show | jobs
by 999900000999 214 days ago
With these new developments, are there any implications for getting LLMs running well on consumer AMD chips ?

For example, the following laptop which I'm thinking of picking up, has both a strong AMD CPU/IGPU and a RTX 5080. Could we see the AMD side competing with the RTX?

I know a dedicated gpu will always be faster though.

>HP OMEN MAX 16-ak0003nr 16" Gaming Laptop Computer - Shadow Black Aluminum AMD Ryzen AI 9 HX 375 (2.0GHz) Processor; NVIDIA GeForce RTX 5080 16GB GDDR7; 32GB DDR5-5600 RAM; 1TB Solid State Drive

2 comments

I run Qwen3 Coder 30b through Ollama on an RTX7900XTX. It works great, I suspect some load gets passed to the 32gb system memory and Ryzen 7 CPU.

It's not quite as fast as like Sonnet 4 from an API, but it's really not that bad.

It's really great for quick questions so I don't have to google stuff, and it's probably Sonnet4 level of competency at achieving coding tasks.

No API served model has been fast enough to remove the urge to do something else while waiting for bigger tasks, so the UX is more or less the same in that regard.

Opencode + ollama + Qwen3 Coder has been a very reasonable alternative to ClaudeCode with Sonnet4.

That is amazing for something running locally.

It is possible that if you actually need AI to be doing all your coding, that you're going to feel differently about the setup. But as a small assistant it's great.

That's great I have been eyeing a Strix Halo and was wondering how well smaller models are doing. This is great news from the perspective of running local agents.
I got one of those running whisper yesterday, hopeful the bigger llms will run shortly. You'd need rocm 7 which seems to be much better than 6.4 was.
Is the performance decent? I'm looking at using it with 30b coding models with a local agent framework like goose to see if we can do this locally as developers instead of risking leaking code to the big models.
The chip in general is fast, it builds llvm in ~12m or so. Whisper on it is at least real time but I only ran the stream binary before sending the box away to SC25. I'm expecting it to need some work to exploit the zero copy the APU permits. So it probably will be fast but isn't just yet, at least on my toolchain.
not the best model to use as a showcase, it's blistering fast on anything that isn't a toaster
Great! That's what I am pointing out, it's a 30b param model that fits into an AMD card and runs great. That's what we want.
You might think that a dGPU is always faster but the limited memory capacity bites you there (unless you go to datacenter dGPUs that cost tens of thousnds). Look at eg https://www.ywian.com/blog/amd-ryzen-ai-max-plus-395-native-... or the various high end Mac results.
So I want this Thinkpad.

https://www.lenovo.com/us/en/p/laptops/thinkpad/thinkpadp/th...?

AMD Ryzen™ AI 9 HX PRO 370 Processor (2.00 GHz up to 5.10 GHz) Operating System Windows 11 Pro 64 Graphic Card Integrated AMD Radeon™ 890M Memory 64 GB DDR5-5600MT/s (SODIMM)(2 x 32 GB)

But I also seriously want to run LLMs. My hunch is a gaming laptop is the best way to do this on the go without spending 5000$ for a Thinkpad with a high end graphics card.