Hacker News new | ask | show | jobs
by waleedk 1136 days ago
[Author] You approximate the weights using fewer bits. You also switch to ints instead of floats and then do some fancy stuff when multiplying to make it all work together.

More detail than you probably wanted: https://huggingface.co/blog/hf-bitsandbytes-integration

1 comments

The latest release of bitsandbytes uses a new fp4 format. 4bit floating point scailing results in much lower perplexity than int4.

Also note that for a fixed memory (RAM) size, 4bit (even int4) is always superior, resulting in lower perplexity than 8bit.

E.g. LLaMA-13B int4 is far better/lower perplexity than LLaMA-7B fp8 while using the same amount of RAM.