Hacker News new | ask | show | jobs
by damnitbuilds 23 days ago
I got Qwen 3.6 running locally on 12GB VRAM.

It went:

  AI: "I see you are building a Django project. How can I help?"

  Me: "When I click on the Reload button, it does not set the reload option correctly. Fix this"

     <10 minutes>

  AI: "I see you are building a Django project. How can I help?"
Needs more tweaking of the context window, I think.

Seriously, I agree that this is the future, when OpenAI et al have gone bust.

3 comments

I think this is the key issue with running locally hosted models.

Yes, technically you can run them on 12gb vram.

But should you?

Realistically 64gb seems to be the current threshold for getting meaningful work done while also maintaining a large enough context window.

This will drop further with increase in intelligence density.
It should, which is why I said it is the current threshold.
I tweaked it and now I get good, better-than-Copilot answers, on local hardware but a little bit slower than Copilot ( okay, ~10min vs ~1 minute ).

I can take that for the joy of running this locally !

I think it's a huge bubble about to pop. I get that enterprises are like elephants, slow to move, locked into agreements.

But I think free is going to be infinitely better than paying Anthropic more money than you used to spend on your human payroll. The big pop is coming.