Hacker News new | ask | show | jobs
by mystraline 2 days ago
Check out Krasis https://github.com/brontoguana/krasis

It enables something similar to unified memory. Ive got a 5060 (16GB) card and 96 GB ddr5.

I can run qwen3.5-122b int4 at 25tok/sec.And now even does image ingestion!

Ive been bulk transliterating and translating foreign language books into english. And all completely local.

1 comments

Wait so this makes it so I can use my DDR5 as well as my VRAM combined? This is actually sick if so. Maybe I will actually have to go out and buy some more DDR5 (currently only have 32GB...)
Yep, thats it what it does. Only works with nvidia.

The difference it does use safetensors, and not gguf's. But it does dynamically requant to int4 8 or bf16.

Wow that's actually sick as hell, somehow hadn't heard of this. maybe I will go and blow $700 on a new ram kit... thanks for sharing!
Glad to share!

But go try it out now with a 35B model on your current hardware.

Right now, I have loaded qwen3.6-35B-A3B, 128k context, kv cache 2.5GB, thinking. Int8

Using 11.5GB gfx ram, 42GB system ram.

I dont want to oversell. All GPU would be faster, but creating a semi-unified system is deffo a game changer for me.