Hacker News new | ask | show | jobs
by mattmanser 4 hours ago
That's a quant 4 which the thread OP specifically called out as rubbish.

The Q4_K_XL bit for those not in the know.

2 comments

Anyone calling Qwen3.6-35B-A3B-Q4_K_XL “rubish” has no idea what they are talking about.
I'd agree that the quality degrades a lot between Q8 and Q4, borderline unusable as they start to fail with tool calling syntax even. Personally I'd say Q8 is as low as you want to go.
q4 isn't rubbish, but it's a compromise for a good value, q6 is essentially a no-compromise quantization and it's what i recommend for MoEs in my experience for agentic workflows
He's probably calling me out for this comment https://news.ycombinator.com/item?id=48557579
I typically find myself using a context of between 150-500k with GPT models so local models are simply not enough and I stopped using them.
That's way higher than their optimal ceiling (and absolutely suboptimal from a token cost point of view), why are you doing that?
You're 100% right and its even severe than that: I daily drive on xhigh. I really try to avoid it, but when reconciling APIs across two large codebases you really start pressing north of 200k. I find myself topping out at 800k sometimes and that's with careful context management. I actually had to drop to GPT 5.4 for 1M context in my subscription because GPT 5.5 tops out at 272k. Hitting 800k context is better than repeatedly hitting let's say 200k out of 272k with multiple rounds of compaction. I run Can's snapcompact and while its better than normal compaction it still lobotomizes the model more than running with a very high context window.
large contexts degrade the performance - attention doesn't work will for large windows like that and cloud models are kind of hacking it

local models do involve some context engineering to get it okay, but it's not that rough