|
|
|
|
|
by miven
731 days ago
|
|
> For on-device inference, we use low-bit palletization, a critical optimization technique that achieves the necessary memory, power, and performance requirements. Did they go over the entire text with a thesaurus? I've never seen "palletization" be used as a viable synonym for "quantization" before, and I've read quite a few papers on LLM quantization |
|