|
|
|
|
|
by dulakian
469 days ago
|
|
I am using the Q6_K_L quant and it's running at about 40G of vram with the KV cache. Device 1 [NVIDIA GeForce RTX 4090]
MEM[||||||||||||||||||20.170Gi/23.988Gi] Device 2 [NVIDIA GeForce RTX 4090]
MEM[||||||||||||||||||19.945Gi/23.988Gi] |
|