Hacker News new | ask | show | jobs
by simjnd 57 days ago
For TurboQuant on model weights AFAIK it's currently a single person effort [1]. It needs his fork of llama.cpp, hasn't been upstreamed. He publishes his quantizations on HuggingFace but I'm not sure if he open-sourced the quantization pipeline.

[1]: https://x.com/coffeecup2020