Hacker News new | ask | show | jobs
by spidersouris 111 days ago
Note that a similar idea had already been suggested by Shen et al. (2025) in Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism (https://arxiv.org/abs/2506.01979), but with lower performance.