Note the README in the Unsloth list of files: llama.cpp is working on a PR to support the gemma4 drafters: https://github.com/ggml-org/llama.cpp/pull/23398. Also note the PR submitter didn't experience much speedup with 26B (seems typical that MoE models don't generally benefit from MTP).
I do have the Qwen 3.6 (35B) MTP implementation running (in LM Studio; it doesn't need a separate drafter), along with non-MTP Gemma 4 26B, and I can see that Unsloth Studio can run the new QAT, but I can't see how you can run the assistant/drafter. Yet.
It's just a constantly changing landscape. Don't get me wrong, it's fascinating and for various reasons I am pleased I can keep up even slightly, but eeeehhh :-)
Yeah — that is the base QAT model, and there are safetensors weights for the QAT version of the MTP drafter, but there are no MLX/GGUF versions. I think the answer is a combination of:
1) Gemma 4 MTP is too fresh for off-the-shelf software to use anyway
2) "you can convert them yourself" which is fine, obvs
- Safetensors: https://huggingface.co/google/gemma-4-26B-A4B-it-qat-q4_0-un...
- GGUF: https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/tree/...
Note the README in the Unsloth list of files: llama.cpp is working on a PR to support the gemma4 drafters: https://github.com/ggml-org/llama.cpp/pull/23398. Also note the PR submitter didn't experience much speedup with 26B (seems typical that MoE models don't generally benefit from MTP).