|
|
|
|
|
by flutetornado
584 days ago
|
|
GPU workloads are either compute bound (floating point operations) or memory bound (bytes being transferred across memory hierarchy.) Quantizing in general helps with the memory bottleneck but does not help in reducing computational costs, so it’s not as useful for improving performance of diffusion models, that’s what it’s saying. |
|