|
Just want to echo the recommendation for qwen3.5:9b. This is a smol, thinking, agentic tool-using, text-image multimodal creature, with very good internal chains of thought. CoT can be sometimes excessive, but it leads to very stable decision-making process, even across very large contexts -something we haven't seen models of this size before. What's also new here, is VRAM-context size trade-off: for 25% of it's attention network, they use the regular KV cache for global coherency, but for 75% they use a new KV cache with linear(!!!!) memory-token-context size expansion! which means, eg ~100K token -> 1.5gb VRAM use -meaning for the first time you can do extremely long conversations / document processing with eg a 3060. Strong, strong recommend. |
I like the idea of finding practical uses for it, but so far haven't managed to be creative enough. I'm so accustomed to using these things for programming.