Hacker News new | ask | show | jobs
by mlboss 1197 days ago
Is it possible to run the smallest one on a consumer gpu with 24gb ram ?
3 comments

You can do even better!. You can run the second smallest one (better than GPT-3 175B) on 24GB of vram, ie LLaMA-13B. https://github.com/oobabooga/text-generation-webui/issues/14...
Running it is easy but you'll probably want to finetune it, too
I would be surprised if you can't. The smallest weight file is 14gb apparently
https://github.com/facebookresearch/llama/blob/main/FAQ.md#3

Looks like it needs 14gb for weights and it isn't clear what the minimum size for the decoding cache is, but it defaults to settings for 30gb GPUs.

In int8 7B needs only 9GB of VRAM and 13B needs only 20GB on a single GPU. https://github.com/oobabooga/text-generation-webui/issues/14...