|
|
|
|
|
by ekidd
164 days ago
|
|
One easy way to test different models is purchase $20 worth of tokens from one of the Open Router-like sites. This will let you asks tons of questions and try out lots of models. Realistically, the biggest models you can run at a reasonable price right now are quantized versions of things like the Qwen3 30B A3B family. A 4-bit quantized version fits in roughly 15GB of RAM. This will run very nicely on something like an Nvidia 3090. But you can also use your regular RAM (though it will be slower). These models aren't competitive with GPT 5 or Opus 4.5! But they're mostly all noticeably better than GPT-4o, some by quite a bit. Some of the 30B models will run as basic agentic coders. There are also some great 4B to 8B models from various organizations that will fit on smaller systems. A 8B model, for example, can be a great translator. (If you have a bunch of money and patience, you can also run something like GPT OSS 120B or GLM 4.5 Air locally.) |
|
OpenRouter gives you $10 credit when you sign up - stick your API key in and compare as many models as you want. It's all browser local storage.