Y
Hacker News
new
|
ask
|
show
|
jobs
Llama.cpp speculative sampling: 2x faster inference for large models
(
github.com
)
4 points
by
bobivl
1019 days ago
1 comments
xchip
1019 days ago
Small caveat: this is only true for generating text with simple grammar, like code. For human text this doesn't work so well.
link