Hacker News new | ask | show | jobs
by roscas 311 days ago
From my experience, qwen3-coder is way better. I only have gpt-oss:20b installed to make a few more tests but I give it a program to make a summary of what it does and qwen3 just works in a few seconds, while gpt-oss was cancelled after 5 minuts... doing nothing.

So I just use qwen3. Fast and great ouput. If for some reason I don't get what I need, I might use search engines or Perplexity.

I have a 10GB 3080 and Ryzen 3600x with 32gb of RAM.

Qwen3-coder is amazing. Best I used so far.

5 comments

Qwen3 coder 480B is quite good and on par with Sonnet 4. It’s the first time I realized the Chinese models are probably going to eclipse US-based models pretty soon, at least for coding.
Where do you use qwen3 480b from, I'm not even seeing it on Openrouter. EDIT nm, openrouter is just calling it qwen3-coder-- when I click for more info it shows it's Qwen3-Coder-480B-A35B-Instruct. And it's one of their free models. Nice
cerebras code (both sub and api) have it
what edit format u use with Qwen? https://aider.chat/docs/more/edit-formats.html

diff is failing me or do you guys use whole?

That might be a stretch, maybe Sonnet 3.5. But it is pretty impressive as is Kimi on opencode.
I've been using lightly gpt-oss-20b but what I've found is that for smaller (single sentence) prompts it was easy enough to have it loop infinitely. Since I'm running it with llama.cpp I've set a small repetition penalty and haven't encountered those issues since (I'm using it a couple of times a day to analyze diffs, so I might have just gotten lucky since)
I had the same issue with other models where they would loop repeating the same character, sentence or paragraph indefinitely. Turns out the context size some tools set by default is 2k and this is way too small.
I’ve been using the ollama version (uses about 13 Gb RAM on macOS) and haven’t had that issue yet. I wonder if that’s maybe an issue of the llama.cpp port?
Never used ollama, only ready to go models via llamafile and llama.cpp.

Maybe ollama has some defaults it applies to models? I start testing models at 0 temp and tweak from there depending how they behave.

The 20B version doesn't fit in 10GB. That might explain some issues?
Are you using this in an agentic way or in a copy and paste and “code this” single input single output way?

I’d like to know how far the frontier models are from the local for agentic coding.

What Qwen3-Coder model are you using? Quantized or not?

Asking because I'm looking for a good model that fits in 12GB VRAM.