| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by crabbycarrot 1251 days ago
	Highly recommend quantizing the model (https://pytorch.org/tutorials/recipes/recipes/dynamic_quanti...). I converted the large model to use int8, and I'm able to run it 5x real-time on CPU with pretty low RAM requirements with still very good quality.