Hacker News new | ask | show | jobs
by nwoli 1196 days ago
It should also be mentioned that it isn’t really that each weight is a 4 bit float, but rather that they’re basically clustering floats into 2^4 clusters and then grabbing from a lookup table the float associated to a 4 bit value as needed. So as long as the weights roughly fall into 16 clusters you’ll get identical results