| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lubitelpospat 80 days ago
	If you're using litert-lm on a Mac with Apple Silicon - DO NOT forget to use "--backend gpu"! On my M1 Pro laptop this single setting resulted in 10x prefill performance and 2x decode performance. To anyone who knows how the internals of litert-lm work - what quantization does it use? How come the model is just 3.4 GB in size? EDIT: typo fix.