Y
Hacker News
new
|
ask
|
show
|
jobs
by
denn-gubsky
20 hours ago
Try qwen3-coder or qwen3-coder-next models which fit into your configuration. This is team-of-experts model which may load only actual experts into GPU.
1 comments
limondas
18 hours ago
Thanks for your reply. But it's to big for my PC. In PC around 1.5GB models got 20 token/s , which is too low for agentic workflow.
link
denn-gubsky
10 hours ago
try latest gemma4:12b. It fits into 16Gb with 256K context window
link