Hacker News new | ask | show | jobs
by rowanG077 739 days ago
Totally unclear what this is. I scrolled through the readme and it didn't even mention once what it does.
3 comments

It's an LLM architecture competing with transformers: https://arxiv.org/abs/2312.00752

Proponents of it usually highlight it's inference performance, in particular linear scaling with the input tokens.

I really disagree with pigeonholing it as an LLM architecture! It is much more general than that as I mentioned in another comment in this post [1] (and of course as mentioned in the original paper which you linked).

[1] https://news.ycombinator.com/item?id=40616181

It totally mentions what it does. It takes the sentence "I have a dream that" and extends it to: "I have a dream that I will be able to see the sunrise in the morning."

It's an LLM.

It’s much more than just an LLM. The mamba architecture is often used in the backbone of an LLM but you can use it more generally as a linear-time (as opposed to quadratic-time) sequence modeling architecture (as per the original paper’s title, which is cited in the linked repo). It is much closer to a convolutional network or an RNN (it has bits of both) than to a transformer architecture. It is based off the notion of state spaces (with a twist).

I use Mamba for instance to build surrogate models of physics-based building energy models which can generate 15-min interval data for heating, cooling, electricity, and hot water usage of any building in the US from building characteristics, weather timeseries, and occupancy time series.

It has many other non-NLP applications.

Would love to hear more about that building energy modelling example, have you done a writeup you could share?
The Mamba application is my current research project so I haven’t published anything yet. But the basic idea is to create a latent representation of the static features, repeat the latent vector to form a time series, concatenate with the weather/occupancy time series, run through mamba layers, and bob’s your uncle. Shoot me an email (in my bio) if you would like to chat more!

I can also share my master’s thesis which is similar but using CNN layers rather than Mamba and only for monthly predictions rather than 15-min interval data. There are some other architectural differences but the basics are the same. That work is also globally robust.

As you can imagine, the current work I am doing at a much higher resolution is a big step up, and Mamba so far is working out great.

Can I see your thesis?

I'm currently learning about machine learning and digital twin but don't really where to start

Is there a good, easy tutorial?
Completely - I assumed it was an implementation of https://github.com/mamba-org/mamba

I also assumed that "a pure NumPy implementation" meant that it was built purely with numpy, which it isn't smh