Hacker News new | ask | show | jobs
by brucethemoose2 930 days ago
My recommendation is:

- Exui with exl2 files on good GPUs.

- Koboldcpp with gguf files for small GPUs and Apple silicon.

There are many reasons, but in a nutshell they are the fastest and most VRAM efficient.

I can fit 34Bs with about 75K context on a single 24GB 3090 before the quality drop from quantization really starts to get dramatic.

1 comments

Thanks! I will check out Koboldcpp.