Hacker News new | ask | show | jobs
by roscas 25 days ago
Continue on vscode with Ollama running (start it with "ollama serve") is great. There are some offline models like these that im using but not forget the qwen3.5 coder also.

"ollama list NAME ID SIZE MODIFIED laguna-xs.2:latest ba9ecde43b0e 23 GB 12 hours ago nemotron3:33b f6d8b7ff496c 27 GB 4 days ago qwen3.6:latest 07d35212591f 23 GB 6 weeks ago gemma4:e2b 7fbdbf8f5e45 7.2 GB 7 weeks ago gemma4:e4b c6eb396dbd59 9.6 GB 7 weeks ago "

You can download it from Continue or just use "Ollama pull <name>" from what you choose from ollama.com site and search on models. these run mostly on cpu as my 3080 cannot load those with more than 10gb but the cpu speed is amazing, it outputs faster than I can read!

1 comments

The new laptop has only 32GB DDR5 and a RTX 4070 with 8 GB GDDR6 on Xubuntu, so Gemini recommended qwen2.5. I don’t think I want to run anything larger because as you said it’d run on CPU and system RAM. As it is, the 14B model will still spill over some and not entirely fit into the GPU.