Hacker News new | ask | show | jobs
by post-it 58 days ago
Thanks! Could it conceivably load the sub-models in series rather than parallel? 8 still won't be enough but I wonder if those with 16 could eke something out.
1 comments

In theory yes - the pipeline already does this to some extent with its low_vram mode, offloading models to CPU between stages. The challenge at 16GB is that even a single 1.3B sub-model at fp32 plus activations can push past what's available after macOS takes its share. Someone on an M1 iMac with 16GB did get geometry generation working tho (issue #5 on the repo), so 16GB is probably possible. 24GB gives comfortable headroom though.