Hacker News new | ask | show | jobs
by est 83 days ago
> I am confused what actually happens in the vectorized ADD and MULT instructions in the GPU with these quantized numbers.

I might be wrong, but I think LLM is all about comparing distance between tokens. You can tell that -255 and +255 are very separated, but you are also away that -8 and +8 are also very far away.

Microsoft Bitnet and Google TurboQuant shows that in extreme you can use just -1, 0, +1