| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by 2ndorderthought 42 days ago
	Have you tried the latest qwen3.6 models? For most of my questions and 8-9b model works great. Upshot is not having chatgpt/meta sell my data or target me with random thoughts later.

3 comments

ekjhgkejhgk 42 days ago

We're in the same boat. I would rather have NO llm, than an llm that collects my data (which you should assume is all of them, unless you've been asleep for the last 20 years).

Fortunately, I don't have to pick one or the other - instead I run Qwen 3.6 35B A3B. It's a bit slow with my 8gb GPU (I'm in the process of getting a bigger one) but again, to me the choice isn't "what's the best I can get", it's "what's the best local I can get".

link

entrope 42 days ago

I let Qwen3.6-27B chew on a bug all last night. It choked at some point and stopped responding (probably a context overflow before pi-coding-agent could compact it). Claude Sonnet 4.6 found and fixed the bug in under 10 minutes.

Qwen3.6 is pretty amazing for a 27B model, but it's not hard to run into its limits. With a Radeon R9700 and unsloth's 6-bit quantization, I get ~20 TPS and 110k context, so it can do a fair bit quickly.

link

2ndorderthought 42 days ago

You definitely need to watch it more than a model 100 times larger. But the fact that it runs one 1 GPU and does what it does is insane. Imagine what a 30b model looks like in 6 months or 1 year?

link

datadrivenangel 42 days ago

Inference speed is still slow in a meaningfully different way. The models are good, but not great, and much slower, which for coding means a 2-3 minute task with claude code and opus takes an hour and has a higher chance of being wrong.

link

2ndorderthought 42 days ago

It's only slow if you can't afford to run it properly. A lot of people are getting 70-100 tokens per second on 1 gpu.

Not sure what Claude opus or sonnet run at. I know when it goes offline it's 0 tokens per second

link