| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lelandbatey 21 days ago

A gaming PC can already host models that perfectly serve casual users who just want recipes, todo tracking, picture identification, etc. E.g. Qwen 3.6 35b which will run on a $650 GPU at 75 t/s (Nvidia 1660 ti 16GB).

Said model will also run as a tool-calling coding model excellently (it's no Opus, but for a thing that once set up is just the cost of energy, it's incredible). It can type faster than you can, probably 10x faster, so with guidance it'll make you faster. And it's free.

It's here. If folks want ChatGPT without a subscription, they can have it today on their computer. The only money to be made is in the high end models doing "serious business" work spanning 1M+ token contexts and massive uncertainty. Everything else is already set to be eaten by today's local models.

2 comments

simonw 21 days ago

The problem with models like Qwen 3.6 35B (which really is an excellent model) is that my expectations of what a model can do have gone SO high now.

Here's a prompt I just ran against Claude Opus 4.7:

> Use python3 to experiment with whether the SQLite3 authorizer mechanism can be used to detect an INSERT OR REPLACE based just on running an explain query without examining the SQL string itself

Opus nailed it: https://claude.ai/share/c4212606-3fee-4b7c-bc97-505e0348ccac

I tried the same thing against qwen/qwen3.5-35b-a3b running locally in lmstudio, with the Pi coding agent. At first it looked like it was going to do great! And then it fell apart over the course of several tool calls: https://gisthost.github.io/?8ae2f842df619fb7fd8f1ccd82fe41c7

I'm used to GPT-5.5 and Opus 4.7 handling that kind of prompt without any problems at all.

link

lelandbatey 20 days ago

Something is definitely going wrong with your Qwen setup, in the link you posted it starts and ends with a compaction step due to a 4k token context limit. Qwen 35b supports I think up to 200k+ context limit (though I run only with 128k), that seems to be a major source of the problem.

link

simonw 20 days ago

Good call, I need to check if LM Studio is misconfigured.

link

scribble0242 20 days ago

This worked for me with qwen3.6-36b-a3b even at a q4 quant. I ran pi in a docker container and it had to figure out how to install python as well. I used the same initial prompt you had without any additional. You talked about Qwen 3.6, but then said you tried Qwen 3.5 in lmstudio. Not sure if you meant Qwen 3.6. I ran with llama.cpp llama-server with the recommended settings from unsloth.

I'm not an expert in SQLLite so I can't say if this is 100% correct, but it seemed directionally similar to the conclusion from claude.

  ### TL;DR
  
  - Authorizer + EXPLAIN:  No — authorizer only sees SQLITE_INSERT, not VDBE opcodes
  - EXPLAIN opcode analysis alone:  Yes — Delete opcode at position 10 is the unique signature of INSERT OR REPLACE / REPLACE

I can't help but think the not-so-distant future will see language models expected on commodity personal computing devices.

link

simonw 20 days ago

OK that's a very good answer! Do you mind sharing the transcript?

link

scribble0242 20 days ago

Sure I cleaned up the jsonl session file a little here: https://pastebin.com/PL9EPn9Y

I tried it a second time, and it spent a lot of time trying to figure out some authorization issue, so definitely not a slam dunk. I might run it a few more times for science. But while this is a new model it's also quite lightweight, and as hardware adapts and improves it seems inevitable that for many use-cases a packaged language model running locally will do the trick.

link

Balinares 20 days ago

So one of the prominent LLM advocates known for testing every model shared a prompt intended to exhibit Opus 4.7 capabilities, and Qwen 3.6 sorted it out okay? Interesting.

Not saying they're equivalent, local models still decohere much quicker as the context grows in my experience. But... Interesting.

link

whattheheckheck 20 days ago

Thats when your build a better Ralph loop around your llm for it to converge to an answer and not rely on 1 shots

link

vineyardmike 20 days ago

> a thing that once set up is just the cost of energy

I don't think we can discount this, frankly. Newer electronics are energy efficient, but older devices are more energy-intensive, and unless configured well, a gaming PC can easily use a few dollars a month in electricity, so now you're approaching subscription territory. A subscription comes with no upfront cost, higher reliability, no wasted space in your home, mobile apps, etc. (and less privacy).

link