Hacker News new | ask | show | jobs
Llama.cpp speculative sampling: 2x faster inference for large models (github.com)
4 points by bobivl 1019 days ago
1 comments

Small caveat: this is only true for generating text with simple grammar, like code. For human text this doesn't work so well.