Y
Hacker News
new
|
ask
|
show
|
jobs
by
cypress66
979 days ago
>I'm running gpt2-xl (1.5B params) locally with KV caching at 120ms/token (vs. 450ms without caching).
That seems very slow compared to llama cpp?
1 comments
smpanaro
979 days ago
Yeah, I believe it is. You trade off speed for lower power usage and CPU. 8 tokens/sec is usable though.
link