Hacker News new | ask | show | jobs
by cypress66 979 days ago
>I'm running gpt2-xl (1.5B params) locally with KV caching at 120ms/token (vs. 450ms without caching).

That seems very slow compared to llama cpp?

1 comments

Yeah, I believe it is. You trade off speed for lower power usage and CPU. 8 tokens/sec is usable though.