Hacker News new | ask | show | jobs
by eightysixfour 1198 days ago
https://github.com/facebookresearch/llama/blob/main/FAQ.md#3

Looks like it needs 14gb for weights and it isn't clear what the minimum size for the decoding cache is, but it defaults to settings for 30gb GPUs.

1 comments

In int8 7B needs only 9GB of VRAM and 13B needs only 20GB on a single GPU. https://github.com/oobabooga/text-generation-webui/issues/14...