|
|
|
|
|
by germanjoey
604 days ago
|
|
They said in the announcement that they've implemented speculative decoding, so that might have a lot to do with it. A big question is what they're using as their draft model; there's ways to do it losslessly, but they could also choose to trade off accuracy for a bigger increase in speed. It seems they also support only a very short sequence length. (1k tokens) |
|