Hacker News new | ask | show | jobs
by ignoramous 875 days ago
At that level of quantization / distillation, smaller models like phi-2 (q&a) and wavecoder-6.7b (code-gen) might be preferable over QLoRAd ones: https://huggingface.co/microsoft/phi-2

> 2bit is pretty damn terrible

Wait till you go hybrid [0] or even 1bit [1]

[0] https://github.com/efeslab/Atom

[1] https://github.com/IST-DASLab/qmoe