| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by robofanatic 130 days ago

> Mamba-3 is a new state space model (SSM) designed with inference efficiency as the primary goal — a departure from Mamba-2, which optimized for training speed. The key upgrades are a more expressive recurrence formula, complex-valued state tracking, and a MIMO (multi-input, multi-output) variant that boosts accuracy without slowing down decoding.

Why can’t they simply say -

Mamba-3 focuses on being faster and more efficient when making predictions, rather than just being fast to train like Mamba-2.

7 comments

esquire_900 130 days ago

This is sort of what their first sentence states? Except your line implies that they are fast in training and inference, they imply they are focusing on inference and are dropping training speed for it.

It's a nice opening as it is imo

cubefox 130 days ago

They don't say anything about dropping training speed.

estearum 130 days ago

> a departure from Mamba-2, which optimized for training speed.

?

cubefox 130 days ago

Yes? Mamba-2 optimized for training speed compared to Mamba-1. Mamba-3 adds optimization for inference. These are pretty much version numbers.

i000 129 days ago

Agreed. What you wrote was probably the input, what we see is the LLM output with the directive to "make us sound smart, put gratuitous em-dash"

E-Reverance 130 days ago

The first sentence basically does though, no?

robofanatic 130 days ago

Of course my only objection was the language. LLMs are now old enough to leave the jargon behind and talk in simple easy to understand terms.

oersted 130 days ago

I’d argue the opposite, the terminology is fairly mainstream by now and “inference” has a much more specific sense than “making predictions”.

mufasachan 130 days ago

The blog is technical, technical terms in the TL;DR seems relevant to me.

renewiltord 130 days ago

Found the guy who made the Windows error messages say “Your computer did an oopsie :(“ instead of including any useful information.

arendtio 130 days ago

I don't get the downvotes, as I had trouble understanding the intro as well. It seems it was written for a very specific audience.

qeternity 130 days ago

Yes, it is written for a specific audience.

That is not a reason for snark.

As other commenters have noted, it’s well written.

magicalhippo 130 days ago

> I don't get the downvotes

Because the blog post is a technical one and the intro contains very common jargon, and the proposed alternative was wrong.

camillomiller 130 days ago

I don’t know why you’re being downvoted. As a longtime editor your version is immensely better. Looks like the original was probably not human-written.

stavros 130 days ago

Why would the simpler version be better for a technical audience?