Hacker News new | ask | show | jobs
by ekidd 55 days ago
You can't realistically replace a frontier coding model on any local hardware that costs less than a nice house, and even then it's not going to be quite as good.

But if you don't need frontier coding abilities, there are several nice models that you can run on a video card with 24GB to 32GB of VRAM. (So a 5090 or a used 3090.) Try Gemma4 and Qwen3.5 with 4-bit quantization from Unsloth, and look at models in the 20B to 35B range. You can try before you buy if you drop $20 on OpenRouter. I have a setup like this that I built for $2500 last year, before things got expensive, and it's a nice little "home lab."

If you want to go bigger than this, you're looking at an RTX 6000 card, or a Mac Studio with 128GB to 512GB of RAM. These are outside your budget. Or you could look at a Mac Minis, DGX Spark or Strix Halo. These let you bigger models much slower, mostly.

2 comments

Thanks. That is what I suspected. The 3090's in my area seem pretty expensive for a several year old second hand card - they are the same price as a new 5080.

5090 is pretty expensive (~$4000) to justify it over a $10-50 sub. I guess the nice thing is the api side becomes "included", if I ever want to go that route. But if I have a GHCP $40 sub vs a $4000 GC to match it, just on hardware, pay off is at 8 years. If I add in electricity, pay off is probably never.

Sure, the sub can go up in price, but the value proposition for self-running doesn't seem to make sense - especially if I can't at least match Sonnet on GHCP or something like that.

I hope to self-run some not useless LLMs/Agents at some point, but I think this market needs to stabalize first. I just don't like waiting.

For what it's worth, eBay in the US currently has some used 3090s for about $1,300, including some marked "Buy it now." I got mine used for about $1,000, and I'm really happy with it—it's a very solid gaming card for Steam on Linux (if you don't need ray tracing), and it allows me to experiment with models up to about 35B parameters. I'm not saying it's a good investment for you in particular, of course! But it's solid at that price, and you can just chuck it in any consumer gaming rig and get a really fun AI "home lab".

As for models, I'm really genuinely impressed with Gemma4 26B A4B and Qwen3.6 35B A3B right now. Between them, I've seen solid image analysis, good medium-image OCR on very tough images, very good understanding of short stories, good structured data extraction from documents, extremely good language translation, etc. If you wanted to build a custom tool which summarized your inbox/RSS feeds/local news every day, or extracted information from emails and entered it into a database, or automatically captioned images, those tasks are all viable locally. The quality of the results is up dramatically in the last 12 months. At this point, my old personal non-agentic LLM benchmarks are "saturated": All the current leading models score extremely well on literally anything I was asking last year.

It's the true agentic coding workflows where the big models really stand out. And those models are all large enough that the hardware needs to amortized over enough users to run 24 hours/day.

> or a Mac Studio with 128GB to 512GB of RAM. These are outside your budget.

M3 ultra with 80GOu cores and 256GB of ram is $7500 - that’s right at the edge of the budget, but it fits.. if you can get an edu discount through a kid or friend you’re even better off!