| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by TacticalCoder 359 days ago
	> I think the fact that, as far as I understand, it takes 40GB of VRAM to run, is probably dampening some of the enthusiasm. 40 GB of VRAM? So two GPU with 24 GB each? That's pretty reasonable compared to the kind of machine to run the latest Qwen coder (which btw are close to SOTA: they do also beat proprietary models on several benchmarks).

2 comments

cellis 359 days ago

A 3090 + 2xTitanXP? technically i have 48, but i don't think you can "split it" over multiple cards. At least with Flux, it would OOM the Titans and allocate the full 3090

link

Auracle 359 days ago

You can’t split image models over 2 GPUs like you can LLMs.

link

BoredPositron 359 days ago

They also released an inference server for their models. Wan and qwen-image can be split without problems. https://github.com/modelscope/DiffSynth-Engine

link

Auracle 357 days ago

Unless I missed something just from skimming their tutorial it looks like they can do parallelism to speed things up with some models, not actually split the model (apart from the usual chunk offloading techniques).

link