Hacker News new | ask | show | jobs
by ehnto 214 days ago
I run Qwen3 Coder 30b through Ollama on an RTX7900XTX. It works great, I suspect some load gets passed to the 32gb system memory and Ryzen 7 CPU.

It's not quite as fast as like Sonnet 4 from an API, but it's really not that bad.

It's really great for quick questions so I don't have to google stuff, and it's probably Sonnet4 level of competency at achieving coding tasks.

No API served model has been fast enough to remove the urge to do something else while waiting for bigger tasks, so the UX is more or less the same in that regard.

Opencode + ollama + Qwen3 Coder has been a very reasonable alternative to ClaudeCode with Sonnet4.

That is amazing for something running locally.

It is possible that if you actually need AI to be doing all your coding, that you're going to feel differently about the setup. But as a small assistant it's great.

2 comments

That's great I have been eyeing a Strix Halo and was wondering how well smaller models are doing. This is great news from the perspective of running local agents.
I got one of those running whisper yesterday, hopeful the bigger llms will run shortly. You'd need rocm 7 which seems to be much better than 6.4 was.
Is the performance decent? I'm looking at using it with 30b coding models with a local agent framework like goose to see if we can do this locally as developers instead of risking leaking code to the big models.
The chip in general is fast, it builds llvm in ~12m or so. Whisper on it is at least real time but I only ran the stream binary before sending the box away to SC25. I'm expecting it to need some work to exploit the zero copy the APU permits. So it probably will be fast but isn't just yet, at least on my toolchain.
not the best model to use as a showcase, it's blistering fast on anything that isn't a toaster
Great! That's what I am pointing out, it's a 30b param model that fits into an AMD card and runs great. That's what we want.