|
|
|
|
|
by brucethemoose2
930 days ago
|
|
My recommendation is: - Exui with exl2 files on good GPUs. - Koboldcpp with gguf files for small GPUs and Apple silicon. There are many reasons, but in a nutshell they are the fastest and most VRAM efficient. I can fit 34Bs with about 75K context on a single 24GB 3090 before the quality drop from quantization really starts to get dramatic. |
|