Hacker News new | ask | show | jobs
by M4v3R 206 days ago
These are ~2 years behind state of the art from the looks of it. Still cool that they're releasing anything that's open for researchers to play with, but it's nothing groundbreaking.
3 comments

No, it is not as good as Veo, but better than Grok, I would say. Definitely better than what was available 2 years ago. And it is only a 7B research model!
But 7b is rather small no? Are other open weight video models also this small? Can this run on a single consumer card?
> But 7b is rather small no?

Sure, its smallish.

> Are other open weight video models also this small?

Apples models are weights-available not open weights, and yes, WAN 2.1, as well as the 14B models, also has 1.3B models; WAN 2.2, as well as the 14B models, also has a 5B model (the WAN 2.2 VAE used by Starflow-V is specifically the one used with the 5B model.) and because the WAN models are largely actually open weights models (Apache 2.0 licensed) there are lots of downstream open-licensed derivatives.

> Can this run on a single consumer card?

Modern model runtimes like ComfyUI can run models that do not fit in VRAM on a single consumer card by swapping model layers between RAM and VRAM as needed; models bigger than this can run on single consumer cards.

Wan 2.2: "This generation was run on an RTX 3060 (12 GB VRAM) and took 900 seconds to complete at 840 × 420 resolution, producing 81 frames." https://www.nextdiffusion.ai/tutorials/how-to-run-wan22-imag...
My guess is that they will lean towards smaller models, and try to provide the best experience for running inference on device
The interesting part is they chose to go with a normalizing flow approach, rather than the industry standard diffusion model approach. Not sure why they chose this direction as I haven’t read the paper yet.