More detail than you probably wanted: https://huggingface.co/blog/hf-bitsandbytes-integration
Also note that for a fixed memory (RAM) size, 4bit (even int4) is always superior, resulting in lower perplexity than 8bit.
E.g. LLaMA-13B int4 is far better/lower perplexity than LLaMA-7B fp8 while using the same amount of RAM.
Also note that for a fixed memory (RAM) size, 4bit (even int4) is always superior, resulting in lower perplexity than 8bit.
E.g. LLaMA-13B int4 is far better/lower perplexity than LLaMA-7B fp8 while using the same amount of RAM.