| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by WhiteDawn 54 days ago
	Once someone generates a MTP layer for 26B A4B 4 QAT I'll be singing from the hills with my 5 year old GPU.

2 comments

pfheatwole 53 days ago

Models:

- Safetensors: https://huggingface.co/google/gemma-4-26B-A4B-it-qat-q4_0-un...

- GGUF: https://huggingface.co/unsloth/gemma-4-26B-A4B-it-GGUF/tree/...

Note the README in the Unsloth list of files: llama.cpp is working on a PR to support the gemma4 drafters: https://github.com/ggml-org/llama.cpp/pull/23398. Also note the PR submitter didn't experience much speedup with 26B (seems typical that MoE models don't generally benefit from MTP).

link

dist-epoch 54 days ago

Google already did

https://huggingface.co/google/gemma-4-26B-A4B-it-qat-q4_0-un...

link

dofm 54 days ago

This is safetensors. Is there any way to run these on a Mac paired with the MLX QAT?

(Pardon my ignorance; this stuff moves so fast)

link

thangalin 54 days ago

Did you see this?

https://point.free/blog/gemma-4-on-a-2016-xeon/

Xeon, but could be useful for MTP on Mac.

link

dofm 54 days ago

I hadn't seen this, thanks.

I do have the Qwen 3.6 (35B) MTP implementation running (in LM Studio; it doesn't need a separate drafter), along with non-MTP Gemma 4 26B, and I can see that Unsloth Studio can run the new QAT, but I can't see how you can run the assistant/drafter. Yet.

It's just a constantly changing landscape. Don't get me wrong, it's fascinating and for various reasons I am pleased I can keep up even slightly, but eeeehhh :-)

link

dofm 51 days ago

To briefly follow up, as of yesterday llama.cpp can do Gemma 4's MTP, so I have this working at least initially — details here:

https://news.ycombinator.com/item?id=48441450

link

int_19h 53 days ago

https://huggingface.co/lmstudio-community/gemma-4-26B-A4B-it...

link

dofm 53 days ago

Yeah — that is the base QAT model, and there are safetensors weights for the QAT version of the MTP drafter, but there are no MLX/GGUF versions. I think the answer is a combination of:

1) Gemma 4 MTP is too fresh for off-the-shelf software to use anyway

2) "you can convert them yourself" which is fine, obvs

link