Hacker News new | ask | show | jobs
by hibijibies 48 days ago
Hi, I'm from Embedl (embedl.com / https://huggingface.co/embedl) and we made the hfviewer. Could you please elaborate more on why the Nemotron model visualization might be incorrect? A number of passes are performed to get the graph structure from the HuggingFace conf including sometimes exporting the model with torch.export and the recombining it to make the view meaningful. We would love to fix any issues and make the viewer better.
1 comments

The Nemotron model has attention layers interspersed with the Mamba layers, and I didn't see any attention layers in the model. It looks like the attention layers are present but show up as blocks with an RMSNorm followed by two sequential linear layers. The first few resolution levels aren't very useful either.