|
|
|
|
|
by lubitelpospat
80 days ago
|
|
If you're using litert-lm on a Mac with Apple Silicon - DO NOT forget to use "--backend gpu"! On my M1 Pro laptop this single setting resulted in 10x prefill performance and 2x decode performance.
To anyone who knows how the internals of litert-lm work - what quantization does it use? How come the model is just 3.4 GB in size? EDIT: typo fix. |
|