|
|
|
|
|
by GaggiX
1077 days ago
|
|
bitsandbytes is only used during training with these models tho (the 8-bit Adamw) quantizing the weights and the activations to a range of 256 values when the model needs to output a range 256 values creates noticeable artifacts as they are not going to map 1-to-1. |
|