Towards Optimal LLM Quantization

Y	Hacker News new \| ask \| show \| jobs

	Towards Optimal LLM Quantization (picovoice.ai)
	18 points by bejager 751 days ago

5 comments

How does it compare with AWQ, SqueezeLLM, or newer quantization methods?

How do you integrate with vLLM?

Is there a way for me to compress a custom fine-tuned model of my own?

not yet but it's something we have in mind as a future feature.

Decent platform support - any plans for a Rust SDK?

We continuously work on expanding SDK support, Rust is also on the list.

Any benchmarks with Falcon 2?

we don't support Falcon 2 yet but new models are always on our radar to be added to the platform.