| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nl 1208 days ago
	On a CPU I'd estimate it would get a maximum of around 5 tokens per second (a token being a sub-word token, so generally a couple of letters). I suspect it'd be more like 1 token per second on the large model without additional optimisation. Yes models can be split up. See eg Hugging Face Accelerate.

1 comments

davrosthedalek 1208 days ago

That's actually a lot better than I would have thought. Almost usable, and a good exercise in patience.

link

nl 1208 days ago

I'd expect significant performance improvements over the next few months are more people work on this in the same way the stable diffusion is now fairly usable on a CPU. It's always going to be slow on a CPU, but the smaller models might be usable for experimentation at some point.

link