Hacker News new | ask | show | jobs
by vanillax 976 days ago
- GPTQ: pure gpu inference, used with AutoGPTQ, exllama, exllamav2, offers only 4 bit quantization

what is autoGTPTQ and exllama, what do it mean it only works with AutoGPTQ and exllama? Are those like TensorFlow Frameworks?