Y
Hacker News
new
|
ask
|
show
|
jobs
by
somewhatrandom9
6 days ago
Could these quantized models make MTP (Multi-Token Prediction) significantly faster when used as drafters for larger regular Gemma 4 models?
1 comments
dist-epoch
6 days ago
Google already released specialized drafters for Gemma 4.
link
Havoc
6 days ago
The E2B ones? Or what do you mean by specialized drafters?
link
int_19h
6 days ago
They have -assistant in the name, so e.g.:
https://huggingface.co/google/gemma-4-31B-it-assistant
link
Havoc
5 days ago
Thanks
link
girvo
6 days ago
The “-assistant” models released by Google are specialised tiny MTP draft models :)
31b-it-assistant is what enables MTP
link