|
|
|
|
|
by jxmorris12
845 days ago
|
|
In case people are wondering why Mamba is exciting: There's this idea in AI right now that "scaling" models to be bigger and train on more data always makes them better. This has led to a science of "scaling laws" which study just how much bigger models need to be and how much data we need to train them on to make them a certain amount better. The relationship between model size, training data size, and performance turns out to be quite predictable. Transformers are great because they can continue scaling and giving us better performance – unlike, we think, RNNs. Probably the most exciting thing about Mamba is the claim that it can be a bit smaller, and train on a bit less data, and still provide better performance than the equivalent Transformer, especially at longer sequence lengths. For more info, see the scaling laws plot in Figure 4 of the Mamba paper: https://arxiv.org/abs/2312.00752 |
|
https://openreview.net/forum?id=TKIFuQHHECj#
I believe there is a lot of herding going on due to the influence of people who had compute to play around with than deeply insightful or principled exploration of networks.