| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jychang 200 days ago

Mamba based LLMs aren't even close to novel though. IBM's been doing this since forever [1].

Also, you're off on Deepseek V3.2's param count, the full model's 685B in size with the MTP layer.

I don't think there's anything interesting here other than "I guess AMD put out a research paper", and it's not cutting edge when Deepseek or even IBM is running laps around them.

[1] Here's a news article from April, although IBM has been doing it for a long time before that https://research.ibm.com/blog/bamba-ssm-transformer-model

1 comments

credit_guy 199 days ago

It's not cutting edge, so what? Your point is that nobody should publish anything unless it is cutting edge?

link

jychang 199 days ago

Yeah, that's the point of publishing. You get scooped, you lose.

link

credit_guy 199 days ago

This wasn’t published, it was just posted to the arxiv.

link