| HN Mirror

Nope it works great with opencode as a agent, you can build a game or things like that. It works. The trick is a mix among the quantization I used, which is very asymmetric, and the fact that I guess DeepSeek v4 Flash tolerates extreme quantizations better than anything I saw in the past.

What I used was up/gate of routed models, IQ2_XXS, out -> Q2_K, then I quantized routing, projections, shared experts to Q8. The trick is that the very sensible parts are a small amount of the weights, and they are kept very high quality.