|
|
|
|
|
by sweis
562 days ago
|
|
I dug into this once and the "theoretical ideal" of 3 originated in a 1950s paper about vacuum tube computers, which itself immediately backed off and said the choice of base 2 is frequently justified. https://sweis.medium.com/revisiting-radix-economy-8f642d9f3c... In this case, the context are {-1, 0, 1} weights in a LLM model, which I don't think is being used for any hardware efficiency argument. I think it's just quantizing weights into 3 states. |
|