|
|
|
|
|
by oktoberpaard
335 days ago
|
|
It gives weird results for me. I’m using Qwen3-32B with 32K context length at Q4_K_M, with 8 bit KV cache fully offloaded to 24GB VRAM. According to this calculator this should be impossible by a large margin, yet it’s working for me. Edit: this might be because I’ve got flash attention enabled in Ollama. |
|