|
|
|
|
|
by luyu_wu
633 days ago
|
|
The section on speculative execution is interesting.
"This approach allows each forward pass to generate multiple tokens without compromising performance, thereby significantly reducing memory access consumption, and enabling several orders of magnitude speed improvements." Does anyone know if the "several orders of magnitude speed improvement" is accurate? I'm doubtful. Very interesting though! I'll be playing around with this on the weekend! |
|
[1] https://www.reddit.com/r/LocalLLaMA/comments/17h4rqz/specula...
[2] https://arxiv.org/pdf/2402.01528v3