Yes. I run local models, Qwen3.6-27B and IMHO the massive level up was the agents and skills files that I've worked on.
Basically I run a flow
Brainstorming > Create Spec > Review Spec* > Create Plans > Review Plan* > Execute Plan (in subagents) > Review Against Plan > Code Review* > Open PR > Finish Plan (marks plan files done)
* Each review step marked with an asterisk uses a paid larger LLM, right now Deepseek V4 Pro. Having it do this catches a lot of small things, and now I'm effectively one shotting any task I give it.
And it's not costing me much at all, just those three reviews. I could use a free model like Gemini but I'm happy with what I've got.
Sure. It's just an old I7 8700 (non-k), 64gb ram. Running proxmox. But recently I put an AMD R9700 AI Pro, in there which is a 32gb inference focused card, think of it as a 32gb version of a 9070xt.
All the inference happens on that card, so the CPU/RAM is there for the other containers.
I'll eventually swap the motherboard and CPU for something better, so I can fit 1 or 3 more of those cards.
Why not NVIDIA? 32gb on team green means spending crazy money. And I can get 4 R9700s for the cost of one 32gb 5090.
Basically I run a flow
Brainstorming > Create Spec > Review Spec* > Create Plans > Review Plan* > Execute Plan (in subagents) > Review Against Plan > Code Review* > Open PR > Finish Plan (marks plan files done)
* Each review step marked with an asterisk uses a paid larger LLM, right now Deepseek V4 Pro. Having it do this catches a lot of small things, and now I'm effectively one shotting any task I give it.
And it's not costing me much at all, just those three reviews. I could use a free model like Gemini but I'm happy with what I've got.