I had an annoying issue in a setup with two Nvidia L4 cards where trying to run the MoE versions to get decent performance just didn't work with Ollama, seems the same as these:
So for now I'm back to Qwen 3 30B A3B, kind of a bummer, because the previous model is pretty fast but kinda dumb, even for simple tasks like on-prem code review!