| HN Mirror

That's a loaded question, because there's different approaches you can take to run these models. Basically, you want lots of memory (ram or vram), and the more you have, the larger the models can be that you run.

I'd recommend shooting for at least 13B models.

Use "oobabooga/text-generation-webui", which can also serve an OpenAI-compatible API as well as provide a chat interface. It can serve most models, using most methods.

Check out their system requirements page[0], and join some of the communities to learn more about what hardware will work best for you.

This person[1] is providing models of all sorts, in pretty much every optimized format. They also post the minimum RAM requirements for each of the GGML models, which are best if you want to host using CPU/RAM (no video card).

[0] https://github.com/oobabooga/text-generation-webui/blob/main...

[1] https://huggingface.co/TheBloke