Hacker News new | ask | show | jobs
by jwitthuhn 613 days ago
For anyone else looking for the weights which as far as I can tell are not linked in the article:

Base model: https://huggingface.co/Zyphra/Zamba2-7B

Instruct tuned: https://huggingface.co/Zyphra/Zamba2-7B-Instruct

1 comments

I couldn't find any gguf files yet. Looking forward to trying it out when they're available.
It seems that zamba 2 isn't supported yet, the previous model's issue is here:

Feature Request: Support Zyphra/Zamba2-2.7B #8795

Open tomasmcm opened this issue on Jul 31 · 1 comment

https://github.com/ggerganov/llama.cpp/issues/8795

What can be used to run it? I had imagined Mamba based models need a different interference code/software than the other models.
If you look in the `config.json`[1] it shows `Zamba2ForCausalLM`. You can use a version of the transformers library to do inference that supports that.

The model card states that you have to use their fork of transformers.[2]

1. https://huggingface.co/Zyphra/Zamba2-7B-Instruct/blob/main/c...

2. https://huggingface.co/Zyphra/Zamba2-7B-Instruct#prerequisit...

To run gguf files? LM Studio for one. I think recurse on macos as well and probably some others.
As another commenter said, this has no GGUF because it’s partially mamba based which is unsupported in llama.cpp
dev of https://recurse.chat/ here, thanks for mentioning! rn we are focusing on features like shortcuts/floating window, but will look into support this in some time. to add to the llama.cpp support discussion, it's also worth noting that llama.cpp does not yet support gpu for mamba models https://github.com/ggerganov/llama.cpp/issues/6758
Gpt4all is a good and easy way to run gguf models.
Mamba based stuff tends to take longer to become available