|
|
|
|
|
by csdvrx
1129 days ago
|
|
Have you tried quantization? It's often a cheap and simple way to reduce the VRAM requirements. What hardware are you using? (CPU,RAM,GPU,VRAM) Have you considered using llama.cpp for a mixed CPU+GPU use (if you have enough RAM) |
|