| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wrsh07 776 days ago
	I think any next token predictor will benefit. Iiuc mamba is a next token predictor. I just skimmed the gradient article, but if their only change is swapping out the transformer block for the mamba block, I don't think it's already using this optimization