Hacker News new | ask | show | jobs
by godelski 843 days ago
Yes and no.

The thing is that Mamba is not perfect. There's no neural architecture to rule them all, if you will. I think the bigger issue is that we more act like there is and get on bandwagons. Let me give a clearer example from the past so we can see. The predecessor to DDPM (the work that kicked off the diffusion model era) was published in 2015[0], only a year after GANs[1]. Diffusion then showed promise but why did DDPM come out in 2020[2]? Because everyone was working on GANs. GANs produced far better images and diffusion (still is) was a lot more computationally intensive. Plus, all the people working on these diffusion models were in the same camp as those working on Normalizing Flows and other density based models, and fewer people are interested in understanding density estimation.

So the entire problem is that the community hopped on a singular railroad for research direction. There was still working going on in that direction but it wasn't nearly getting the attention that GANs got. It's hard to know if things were blocked from publication because they weren't as good as GANs. I can say from personal experience I had a Flow publication blocked because reviewers were concerned with its quality compared to GANs (this was 2019/2020, this paper will never be published because now it is even hard to publish a GAN work).

So yes and no because there is certainly railroading happening but there are also real critiques to Mamba. But what people often forget is that it is incredibly naive to compare new methods to existing methods on a direct one-to-one comparison. You're comparing something that has hundreds of hours to thousands of hours from a handful to a few dozen eyes against works with millions of hours and millions of eyes. Evaluation is just a really fucking hard thing to do but it is easy to just look at some benchmarks, even if they don't mean much. This is a fairly generalization notion though, so take the lesson to heart. But Mamba seems a bit different than our diffusion/GAN story, in that it is getting more attention than diffusion did in the 2016-2019.

[0] https://arxiv.org/abs/1503.03585

[1] https://arxiv.org/abs/1406.2661

[2] https://arxiv.org/abs/2006.11239