| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ekidd 164 days ago

One easy way to test different models is purchase $20 worth of tokens from one of the Open Router-like sites. This will let you asks tons of questions and try out lots of models.

Realistically, the biggest models you can run at a reasonable price right now are quantized versions of things like the Qwen3 30B A3B family. A 4-bit quantized version fits in roughly 15GB of RAM. This will run very nicely on something like an Nvidia 3090. But you can also use your regular RAM (though it will be slower).

These models aren't competitive with GPT 5 or Opus 4.5! But they're mostly all noticeably better than GPT-4o, some by quite a bit. Some of the 30B models will run as basic agentic coders.

There are also some great 4B to 8B models from various organizations that will fit on smaller systems. A 8B model, for example, can be a great translator.

(If you have a bunch of money and patience, you can also run something like GPT OSS 120B or GLM 4.5 Air locally.)

3 comments

nl 163 days ago

I wrote https://tools.nicklothian.com/llm_comparator.html so you can compare different models.

OpenRouter gives you $10 credit when you sign up - stick your API key in and compare as many models as you want. It's all browser local storage.

link

kouteiheika 164 days ago

> (If you have a bunch of money and patience, you can also run something like GPT OSS 120B or GLM 4.5 Air locally.)

Don't need patience for these, just money. A single RTX 6000 Pro runs those great and super fast.

link

scotty79 163 days ago

> GPT OSS 120B

This one runs at perfectly servicable pace locally on a laptop 5090 with 64gb system ram with zero effort required. Just download ollama and select this model from the drop-down.

link

Muromec 164 days ago

Oh... 8 thousand of eurobucks for the thing.

link

cfn 164 days ago

Or 4 thousand for the NVIDIA RTX A6000 which also runs the 120b just fine (quantized).

link

sofixa 163 days ago

Or a single AMD Strix Halo with lots of RAM, which could be had before the RAM crisis for ~1.5k eur.

link

Haaargio 163 days ago

Or why not just buy a blackwell rack?

Runs everything today with bleeding edge performance.

Overall whats the difference between 8k or 30k?

link

kouteiheika 163 days ago

You jest, but there's a ton of people on /r/localLLaMA which have an RTX 6000 Pro. No one has a Blackwell rack.

As long as you have the money this hardware is easily accessible to normal people, unlike fancy server hardware.

link

cmrdporcupine 164 days ago

This is the answer. There's a half dozen sites that let you run these models by the token, and actually $20 is excessive. $5 will get you a long long way.

link