| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by oatsandsugar 612 days ago

On the page it states:

Our novel shared-attention architecture allows more parameters to be allocated to the Mamba2 backbone. In turn, the shared transformer block preserves the rich cross-sequence dependencies of the attention computation.

so sounds like it is transformer based?