The difference it does use safetensors, and not gguf's. But it does dynamically requant to int4 8 or bf16.
But go try it out now with a 35B model on your current hardware.
Right now, I have loaded qwen3.6-35B-A3B, 128k context, kv cache 2.5GB, thinking. Int8
Using 11.5GB gfx ram, 42GB system ram.
I dont want to oversell. All GPU would be faster, but creating a semi-unified system is deffo a game changer for me.
The difference it does use safetensors, and not gguf's. But it does dynamically requant to int4 8 or bf16.