| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rowanG077 739 days ago
	Totally unclear what this is. I scrolled through the readme and it didn't even mention once what it does.

3 comments

sva_ 739 days ago

It's an LLM architecture competing with transformers: https://arxiv.org/abs/2312.00752

Proponents of it usually highlight it's inference performance, in particular linear scaling with the input tokens.

link

szvsw 739 days ago

I really disagree with pigeonholing it as an LLM architecture! It is much more general than that as I mentioned in another comment in this post [1] (and of course as mentioned in the original paper which you linked).

[1] https://news.ycombinator.com/item?id=40616181

link

wodenokoto 739 days ago

It totally mentions what it does. It takes the sentence "I have a dream that" and extends it to: "I have a dream that I will be able to see the sunrise in the morning."

It's an LLM.

link

szvsw 739 days ago

It’s much more than just an LLM. The mamba architecture is often used in the backbone of an LLM but you can use it more generally as a linear-time (as opposed to quadratic-time) sequence modeling architecture (as per the original paper’s title, which is cited in the linked repo). It is much closer to a convolutional network or an RNN (it has bits of both) than to a transformer architecture. It is based off the notion of state spaces (with a twist).

I use Mamba for instance to build surrogate models of physics-based building energy models which can generate 15-min interval data for heating, cooling, electricity, and hot water usage of any building in the US from building characteristics, weather timeseries, and occupancy time series.

It has many other non-NLP applications.

link

Ddav 739 days ago

Would love to hear more about that building energy modelling example, have you done a writeup you could share?

link

szvsw 739 days ago

The Mamba application is my current research project so I haven’t published anything yet. But the basic idea is to create a latent representation of the static features, repeat the latent vector to form a time series, concatenate with the weather/occupancy time series, run through mamba layers, and bob’s your uncle. Shoot me an email (in my bio) if you would like to chat more!

I can also share my master’s thesis which is similar but using CNN layers rather than Mamba and only for monthly predictions rather than 15-min interval data. There are some other architectural differences but the basics are the same. That work is also globally robust.

As you can imagine, the current work I am doing at a much higher resolution is a big step up, and Mamba so far is working out great.

link

ahmadmijot 739 days ago

Can I see your thesis?

I'm currently learning about machine learning and digital twin but don't really where to start

link

blagie 739 days ago

Is there a good, easy tutorial?

link

piqufoh 739 days ago

Completely - I assumed it was an implementation of https://github.com/mamba-org/mamba

I also assumed that "a pure NumPy implementation" meant that it was built purely with numpy, which it isn't smh

link