Hacker News new | ask | show | jobs
by sigmoid10 843 days ago
True, but bear in mind the Mamba preprint is less than three months old. A lot of people are probably experimenting with these ideas right now and training a completely new, large foundation model with a different architecture will take a significant amount of time.