|
|
|
|
|
by danielhanchen
388 days ago
|
|
For those interested, I made some 1 bit dynamic quants at https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF 74% smaller 713GB to 185GB. Use the magic incantation -ot ".ffn_.*_exps.=CPU" to offload MoE layers to RAM, allowing non MoEs to fit < 24GB VRAM on 16K context! The rest sits in RAM & disk. |
|