Hacker News new | ask | show | jobs
by somewhatrandom9 6 days ago
Could these quantized models make MTP (Multi-Token Prediction) significantly faster when used as drafters for larger regular Gemma 4 models?
1 comments

Google already released specialized drafters for Gemma 4.
The E2B ones? Or what do you mean by specialized drafters?
They have -assistant in the name, so e.g.: https://huggingface.co/google/gemma-4-31B-it-assistant
Thanks
The “-assistant” models released by Google are specialised tiny MTP draft models :)

31b-it-assistant is what enables MTP