| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mattmanser 4 hours ago
	That's a quant 4 which the thread OP specifically called out as rubbish. The Q4_K_XL bit for those not in the know.

2 comments

stymaar 3 hours ago

Anyone calling Qwen3.6-35B-A3B-Q4_K_XL “rubish” has no idea what they are talking about.

link

embedding-shape 3 hours ago

I'd agree that the quality degrades a lot between Q8 and Q4, borderline unusable as they start to fail with tool calling syntax even. Personally I'd say Q8 is as low as you want to go.

link

c0rruptbytes 3 hours ago

q4 isn't rubbish, but it's a compromise for a good value, q6 is essentially a no-compromise quantization and it's what i recommend for MoEs in my experience for agentic workflows

link

greenavocado 3 hours ago

He's probably calling me out for this comment https://news.ycombinator.com/item?id=48557579

link

greenavocado 3 hours ago

I typically find myself using a context of between 150-500k with GPT models so local models are simply not enough and I stopped using them.

link

stymaar 3 hours ago

That's way higher than their optimal ceiling (and absolutely suboptimal from a token cost point of view), why are you doing that?

link

greenavocado 3 hours ago

You're 100% right and its even severe than that: I daily drive on xhigh. I really try to avoid it, but when reconciling APIs across two large codebases you really start pressing north of 200k. I find myself topping out at 800k sometimes and that's with careful context management. I actually had to drop to GPT 5.4 for 1M context in my subscription because GPT 5.5 tops out at 272k. Hitting 800k context is better than repeatedly hitting let's say 200k out of 272k with multiple rounds of compaction. I run Can's snapcompact and while its better than normal compaction it still lobotomizes the model more than running with a very high context window.

link

c0rruptbytes 3 hours ago

large contexts degrade the performance - attention doesn't work will for large windows like that and cloud models are kind of hacking it

local models do involve some context engineering to get it okay, but it's not that rough

link