| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mystraline 53 days ago
	Yep, thats it what it does. Only works with nvidia. The difference it does use safetensors, and not gguf's. But it does dynamically requant to int4 8 or bf16.

1 comments

Wow that's actually sick as hell, somehow hadn't heard of this. maybe I will go and blow $700 on a new ram kit... thanks for sharing!

Glad to share!

But go try it out now with a 35B model on your current hardware.

Right now, I have loaded qwen3.6-35B-A3B, 128k context, kv cache 2.5GB, thinking. Int8

Using 11.5GB gfx ram, 42GB system ram.

I dont want to oversell. All GPU would be faster, but creating a semi-unified system is deffo a game changer for me.