Hacker News new | ask | show | jobs
by M4v3R 929 days ago
Try different quantization variations. I got vastly different speeds depending on which quantization I chose. I believe q4_0 worked very well for me. Although for a 7B model q8_0 runs just fine too with better quality.