| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by breput 145 days ago

Nemotron-3-Nano-30B-A3B[0][1] is a very impressive local model. It is good with tool calling and works great with llama.cpp/Visual Studio Code/Roo Code for local development.

It doesn't get a ton of attention on /r/LocalLLaMA but it is worth trying out, even if you have a relatively modest machine.

[0] https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B...

[1] https://huggingface.co/unsloth/Nemotron-3-Nano-30B-A3B-GGUF

5 comments

bhadass 145 days ago

Some of NVIDIA's models also tend to have interesting architectures. For example, usage of the MAMBA architecture instead of purely transformers: https://developer.nvidia.com/blog/inside-nvidia-nemotron-3-t...

link

nextos 145 days ago

Deep SSMs, including the entire S4 to Mamba saga, are a very interesting alternative to transformers. In some of my genomics use cases, Mamba has been easier to train and scale over large context windows, compared to transformers.

link

jychang 145 days ago

It was good for like, one month. Qwen3 30b dominated for half a year before that, and GLM-4.7 Flash 30b took over the crown soon after Nemotron 3 Nano came out. There was basically no time period for it to shine.

link

breput 145 days ago

It is still good, even if not the new hotness. But I understand your point.

It isn't as though GLM-4.7 Flash is significantly better, and honestly, I have had poor experiences with it (and yes, always the latest llama.cpp and the updated GGUFs).

link

ThrowawayTestr 145 days ago

Genuinely exciting to be around for this. Reminds me of the time when computers were said to be obsolete by the time you drove them home.

link

binary132 145 days ago

I recently tried GLM-4.7 Flash 30b and didn’t have a good experience with it at all.

link

breput 145 days ago

It feels like GLM has either a bit of a fan club or maybe some paid supporters...

link

binary132 145 days ago

I find the Q8 runs a bit more than twice as fast as gpt-120b since I don’t have to offload as many MoE layers, but is just about as capable if not better.

link

superjan 145 days ago

Oh those ghastly model names. https://www.smbc-comics.com/comic/version

link

deskamess 144 days ago

Do they have a good multilingual embedding model? Ideally, with a decent context size like 16/32K. I think Qwen has one at 32K. Even the Gemma contexts are pretty small (8K).

link