|
|
|
|
|
by jasonjmcghee
149 days ago
|
|
> if you want to understand the effects of quantization on model quality, it's really easy to spin up a GPU server instance and play around Fwiw, not necessarily. I've noticed quantized models have strange and surprising failure modes where everything seems to be working well and then does a death spiral repeating a specific word or completely failing on one task of a handful of similar tasks. 8-bit vs 4-bit can be almost imperceptible or night and day. This isn't something you'd necessarily see playing around, but when trying to do something specific |
|