| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by alchemist1e9 614 days ago
	What can be used to run it? I had imagined Mamba based models need a different interference code/software than the other models.

3 comments

gbickford 614 days ago

If you look in the `config.json`[1] it shows `Zamba2ForCausalLM`. You can use a version of the transformers library to do inference that supports that.

The model card states that you have to use their fork of transformers.[2]

1. https://huggingface.co/Zyphra/Zamba2-7B-Instruct/blob/main/c...

2. https://huggingface.co/Zyphra/Zamba2-7B-Instruct#prerequisit...

link

hidelooktropic 614 days ago

To run gguf files? LM Studio for one. I think recurse on macos as well and probably some others.

link

x_may 614 days ago

As another commenter said, this has no GGUF because it’s partially mamba based which is unsupported in llama.cpp

link

xyc 613 days ago

dev of https://recurse.chat/ here, thanks for mentioning! rn we are focusing on features like shortcuts/floating window, but will look into support this in some time. to add to the llama.cpp support discussion, it's also worth noting that llama.cpp does not yet support gpu for mamba models https://github.com/ggerganov/llama.cpp/issues/6758

link

wazoox 614 days ago

Gpt4all is a good and easy way to run gguf models.

link