Hacker News new | ask | show | jobs
by redanddead 80 days ago
AI and graphics are matrices

Matrices are numbers [x,y,z]

GPUs are matrix processing units

Models are big matrices, we quantize them to make them small. That is lossy. Makes AI dumber the harder you quantize but lets you run inference with lesser hardware

What if you could quantize less destructively/lossy? You could make a model way smaller or make much bigger models that run on less RAM

That is what they achieved here. They're not saying that multiplying the matrices with scalars up or down helps. They're saying that by mutating and transforming the matrix with a function (ie. rotating the dimensions by the same "random" rotation) you have matrices that make smarter models fit in smaller boxes, needing way less RAM to achieve the same performance

If we quantized it as aggressively as we would have without the distribution/mutation function, the drop in benchmarks would be even more noticeable

It's actually a huge breakthrough and commercially its probably only a short term loss in valuation for the manufacturers