Hacker News new | ask | show | jobs
by Tostino 1070 days ago
Exllama is significantly faster if you can fit the whole model in VRAM.