| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cypress66 979 days ago
	>I'm running gpt2-xl (1.5B params) locally with KV caching at 120ms/token (vs. 450ms without caching). That seems very slow compared to llama cpp?

1 comments

Yeah, I believe it is. You trade off speed for lower power usage and CPU. 8 tokens/sec is usable though.