Hacker News new | ask | show | jobs
by dchuk 1174 days ago
So if I have a 32GB RAM Macbook Pro, and the instructions say this:

"Vicuna-13B This conversion command needs around 60 GB of CPU RAM."

Does this mean I simply cannot run that model at all? Or will it rip into HD swap or something to make the model weights and just take forever?

5 comments

Can someone explain why computing a delta needs to hold the entire model at once? Can't it just do one layer at time?
Vicuna-13B loads and idles at ~26GB RAM usage on a M1Max/64GB. When answering questions, that grows to around 75GB, and yes, you can feel it (and the machine) slow down significantly when it starts hitting swap. I think realistically you'd be wanting to stick to the 7B model on a 32G machine (even if you could get the weight deltas to apply correctly).
I just reached that step on my Linux laptop which has 32GB of RAM. I'm about to give it a try anyway, but I'm not hopeful based on that comment.

I'm wondering if anyone is torrenting these Vicuna-13B weights?

Someone really needs to write a script that does not load both entire models into memory to do this.
You can try the smaller 7B version.