Hacker News new | ask | show | jobs
by alchemist1e9 614 days ago
What can be used to run it? I had imagined Mamba based models need a different interference code/software than the other models.
3 comments

If you look in the `config.json`[1] it shows `Zamba2ForCausalLM`. You can use a version of the transformers library to do inference that supports that.

The model card states that you have to use their fork of transformers.[2]

1. https://huggingface.co/Zyphra/Zamba2-7B-Instruct/blob/main/c...

2. https://huggingface.co/Zyphra/Zamba2-7B-Instruct#prerequisit...

To run gguf files? LM Studio for one. I think recurse on macos as well and probably some others.
As another commenter said, this has no GGUF because it’s partially mamba based which is unsupported in llama.cpp
dev of https://recurse.chat/ here, thanks for mentioning! rn we are focusing on features like shortcuts/floating window, but will look into support this in some time. to add to the llama.cpp support discussion, it's also worth noting that llama.cpp does not yet support gpu for mamba models https://github.com/ggerganov/llama.cpp/issues/6758
Gpt4all is a good and easy way to run gguf models.