Hacker News new | ask | show | jobs
Towards Optimal LLM Quantization (picovoice.ai)
18 points by bejager 751 days ago
5 comments

How does it compare with AWQ, SqueezeLLM, or newer quantization methods?
How do you integrate with vLLM?
Is there a way for me to compress a custom fine-tuned model of my own?
not yet but it's something we have in mind as a future feature.
Decent platform support - any plans for a Rust SDK?
We continuously work on expanding SDK support, Rust is also on the list.
Any benchmarks with Falcon 2?
we don't support Falcon 2 yet but new models are always on our radar to be added to the platform.