| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by firatsarlar 1202 days ago
	Is this chasing impossible - not criticize, and love effort- ? But is it -a little- really possible to run an LLM in a single machine ? I want to believe :)

3 comments

Tepix 1202 days ago

GTPQ has been the missing piece, it allows quantizing the model weights from 16 to 4 bits with only a small loss in quality. That it turn allows running even the large 65 billion parameter version of the LLaMA model in ~33GB of RAM or VRAM.

With VRAM that requires two 24GB GPUs which is no longer completely out of reach.

The model running in the browser is a smaller version with 7 billion parameters, which is good enough for some things.

link

quickthrower2 1202 days ago

I just tried it and it works. And works amazing compared to anything that existed anywhere on earth one school term ago! so yeah why not?

link

ranguna 1202 days ago

I don't get where your question is coming from, you can already run LLMs on a single machine. Checkout llama.cpp, tabby, text generation webui, gpt4all, AI Dungeon open source models like clover-edition, and know this we gpu based app.

link

firatsarlar 1201 days ago

The question comes from a kind of confusion. We know the requirements of LLMs. How can we run the hardware it is currently working on, only the big LLM, with an 11Gb graphics card? I really didn't mind!

link