|
|
|
|
|
by brucethemoose2
1058 days ago
|
|
exLlama supports batching, and I believe it claws back much the throughput loss from quantization (depending on the exact settings you use to quantize). And as said below, whatever throughput you lose is going to be massively offset by the ability to use smaller single GPUs. |
|