Hacker News new | ask | show | jobs
by cwdz1 1106 days ago
This is great to hear. Thanks. I'll see if I can find them if only just for the sake of curiosity.
1 comments

You can use https://localai.io if you have a GPU or Apple Silicon CPU to serve up local models with an OpenAI-compatible API.
Do you know of a list of hardware recommendations to run this?
That's a loaded question, because there's different approaches you can take to run these models. Basically, you want lots of memory (ram or vram), and the more you have, the larger the models can be that you run.

I'd recommend shooting for at least 13B models.

Use "oobabooga/text-generation-webui", which can also serve an OpenAI-compatible API as well as provide a chat interface. It can serve most models, using most methods.

Check out their system requirements page[0], and join some of the communities to learn more about what hardware will work best for you.

This person[1] is providing models of all sorts, in pretty much every optimized format. They also post the minimum RAM requirements for each of the GGML models, which are best if you want to host using CPU/RAM (no video card).

[0] https://github.com/oobabooga/text-generation-webui/blob/main...

[1] https://huggingface.co/TheBloke