Hacker News new | ask | show | jobs
by astrange 1280 days ago
It'd be pretty surprising if you could quantize a text model and have it still work. It has to be using those lower bits to store text; it's not like you can round a letter up or down.
1 comments

It's not storing any text? The weights are floating point numbers - the "text" is in some extremely high dimensional embedding space.
Of course it's storing text. GPT was trained for less than one epoch; they just continually throw new text in there and it mostly just remembers it (= learns it = compresses it). It's not simply "a high dimensional embedding" because words aren't differentiable; you'll get different words if you round off your "coordinates".

If you go to https://beta.openai.com/playground/ and prompt it "Read me the book Alice in Wonderland" it will quote you word for word the original book.

GPT's compression of text is a model of probabilities for the next token in a sequence, where a token is a bit of text from a vocabulary of ~52,000. You can definitely reduce the precision of the parameters that determine that model without hurting the model's overall accuracy much (consider truncating a probability like 98.0000001221151240690% to 98.0%).

Empirically, people have quantized the weights of language models down to INT4 with very little loss in accuracy; see GLM-130B: https://arxiv.org/abs/2210.02414