| I've been using Claude Code regularly at work for several months, and I successfully used it for a small personal project (a website) not long ago. Last weekend, I explored self-hosting for the first time. Does anyone have a similar experience of having thoroughly used CC/Codex/whatever and also have an analogous self-hosted setup that they're somewhat happy with? I'm struggling a bit. I have 32GB of DDR5 (seems inadequate nowadays), an AMD 7800X3D, and an RTX 4090. I'm using Windows but I have WSL enabled. I tried a few combinations of ollama, docker desktop model runner, pi-coding-agent and opencode; and for models, I think I tried a few variants each of Gemma 4, Qwen, GLM-5.1. My "baseline" RAM usage was so high from the handful of regular applications that IIRC it wasn't enough to use the best models; e.g., I couldn't run Gemma4-31B. Things work okay in a Windows-only setup, though the agent struggled to get file paths correct. I did have some success running pi/opencode in WSL and running ollama and the model via docker desktop. In terms of actual performance, it was painfully slow compared to the throughput I'm used to from CC, and the tooling didn't feel as good as the CC harness. Admittedly I didn't spend long enough actually using it after fiddling with setup for so long, it was at least a fun experiment. |
I use llama-server that comes with llama.cpp instead of using ollama. Here are the exact settings I use.
llama-server -ngl 99 -c 192072 -fa on --cache-type-k q4_0 --cache-type-v q4_0 --host 0.0.0.0 --sleep-idle-seconds 300 -m Qwen3.5-27B-Q4_K_M.gguf