Nemotron-3-Nano-30B-A3B[0][1] is a very impressive local model. It is good with tool calling and works great with llama.cpp/Visual Studio Code/Roo Code for local development.
It doesn't get a ton of attention on /r/LocalLLaMA but it is worth trying out, even if you have a relatively modest machine.
Deep SSMs, including the entire S4 to Mamba saga, are a very interesting alternative to transformers. In some of my genomics use cases, Mamba has been easier to train and scale over large context windows, compared to transformers.
It was good for like, one month. Qwen3 30b dominated for half a year before that, and GLM-4.7 Flash 30b took over the crown soon after Nemotron 3 Nano came out. There was basically no time period for it to shine.
It is still good, even if not the new hotness. But I understand your point.
It isn't as though GLM-4.7 Flash is significantly better, and honestly, I have had poor experiences with it (and yes, always the latest llama.cpp and the updated GGUFs).
I find the Q8 runs a bit more than twice as fast as gpt-120b since I don’t have to offload as many MoE layers, but is just about as capable if not better.
Do they have a good multilingual embedding model? Ideally, with a decent context size like 16/32K. I think Qwen has one at 32K. Even the Gemma contexts are pretty small (8K).