| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by janice1999 42 days ago
	A 24GB Nvidia RTX 3090 TI is ~2000 euro.

2 comments

2ndorderthought 42 days ago

Which is how many months of Claude or Claude + chatgpt when Claude is down? And do you own anything after using those subscriptions? Can you pick and choose from dozens of models and whatever comes next? Can you play video games with your Claude subscription?

link

beej71 42 days ago

Believe me when I say that I want to run local models, and I do. But in my testing, 24 GB doesn't get you much brainpower.

link

2ndorderthought 42 days ago

Have you tried the latest qwen3.6 models?

For most of my questions and 8-9b model works great. Upshot is not having chatgpt/meta sell my data or target me with random thoughts later.

link

ekjhgkejhgk 42 days ago

We're in the same boat. I would rather have NO llm, than an llm that collects my data (which you should assume is all of them, unless you've been asleep for the last 20 years).

Fortunately, I don't have to pick one or the other - instead I run Qwen 3.6 35B A3B. It's a bit slow with my 8gb GPU (I'm in the process of getting a bigger one) but again, to me the choice isn't "what's the best I can get", it's "what's the best local I can get".

link

entrope 42 days ago

I let Qwen3.6-27B chew on a bug all last night. It choked at some point and stopped responding (probably a context overflow before pi-coding-agent could compact it). Claude Sonnet 4.6 found and fixed the bug in under 10 minutes.

Qwen3.6 is pretty amazing for a 27B model, but it's not hard to run into its limits. With a Radeon R9700 and unsloth's 6-bit quantization, I get ~20 TPS and 110k context, so it can do a fair bit quickly.

link

2ndorderthought 42 days ago

You definitely need to watch it more than a model 100 times larger. But the fact that it runs one 1 GPU and does what it does is insane. Imagine what a 30b model looks like in 6 months or 1 year?

link

datadrivenangel 42 days ago

Inference speed is still slow in a meaningfully different way. The models are good, but not great, and much slower, which for coding means a 2-3 minute task with claude code and opus takes an hour and has a higher chance of being wrong.

link

2ndorderthought 42 days ago

It's only slow if you can't afford to run it properly. A lot of people are getting 70-100 tokens per second on 1 gpu.

Not sure what Claude opus or sonnet run at. I know when it goes offline it's 0 tokens per second

link

arjie 42 days ago

You can get them for half that price on Reddit used. I have a few. You will not get top-tier intelligence out of them. GPT-5.5 and Claude Sonnet/Opus are in an unbelievable tier. Not all problems need that, though. I have a Qwen-based agent write short websites for me to use and it is adequate to the task.

link