|
|
|
|
|
by onlyrealcuzzo
28 days ago
|
|
> I don't disagree, but how much of this ends up being distillation? You don't need distillation. They already have the training sets. It's MLA + MoE + Medusa (a better version of Speculative Decoding) + 1.58b (possibly - maybe nothing) + GRAM (which will almost certainly not turn out to be a nothing burger, but no one has quickly turned this around yet to prove it). |
|