Hacker News new | ask | show | jobs
by moffkalast 1102 days ago
Very cool, btw it's not mentioned in the readme so I assume it's only for running full precision models or do quantized GGML/GPTQ/etc. also work with it?
1 comments

Hi there, 8bit and 4bit is currently supported on main. GPTQ is working in progress, as well as GGML
GPTQ support would be amazing (AutoGPTQ is an easy way to integrate GPTQ support - it's basically just importing autogptq and switching out 1 line in the model loading code).