|
|
|
|
|
by WhitneyLand
827 days ago
|
|
- Training isn’t done at 4-bits, to date this small size has only been for inference. - Research for a while now has been finding that smaller weights are surprisingly effective. It’s kind of a counterintuitive result, but one way to think about it is there are billions of weights working together. So taken as a whole you still have a large amount of information. |
|