|
|
|
|
|
by Aurornis
59 days ago
|
|
> As this is a dense model and it's pretty sizable, 4-bit quantization can be nearly lossless The 4-bit quants are far from lossless. The effects show up more on longer context problems. > You can probably even go FP8 with 5090 (though there will be tradeoffs) You cannot run these models at 8-bit on a 32GB card because you need space for context. Typically it would be Q5 on a 32GB card to fit context lengths needed for anything other than short answers. |
|