Hacker News new | ask | show | jobs
by firatsarlar 1155 days ago
Is this chasing impossible - not criticize, and love effort- ? But is it -a little- really possible to run an LLM in a single machine ? I want to believe :)
3 comments

GTPQ has been the missing piece, it allows quantizing the model weights from 16 to 4 bits with only a small loss in quality. That it turn allows running even the large 65 billion parameter version of the LLaMA model in ~33GB of RAM or VRAM.

With VRAM that requires two 24GB GPUs which is no longer completely out of reach.

The model running in the browser is a smaller version with 7 billion parameters, which is good enough for some things.

I just tried it and it works. And works amazing compared to anything that existed anywhere on earth one school term ago! so yeah why not?
I don't get where your question is coming from, you can already run LLMs on a single machine. Checkout llama.cpp, tabby, text generation webui, gpt4all, AI Dungeon open source models like clover-edition, and know this we gpu based app.
The question comes from a kind of confusion. We know the requirements of LLMs. How can we run the hardware it is currently working on, only the big LLM, with an 11Gb graphics card? I really didn't mind!