Hacker News new | ask | show | jobs
by yorwba 331 days ago
Their architecture uses a mix of Transformer and Mamba layers. The question isn't whether it will replace Transformers, but whether it'll become part of the toolkit or whether it'll get abandoned like many other promising approaches.