|
|
|
|
|
by alex7o
50 days ago
|
|
Because when you pay for a subscription they don't silently quantize the model a few week after release, and you can no longer get the full model running. Otherwise no need for full fp16, int8 works 99% as well for half the mem, and the lower you go the more you start to pay for the quants. But int8 is super safe imo. |
|