Hacker News new | ask | show | jobs
by TrinaryWorksToo 1458 days ago
Have you looked into HuggingFace Accelerate? People have supposedly been able to make the tradeoff with that. Although you still need to download the huge models.
2 comments

Can confirm. HuggingFace Accelerate's big model feature[1] has some limits, but it does work. I used it to run a 40GB model on a system with just 20GB of free RAM and a 10GB GPU.

All I had to do was prepare the weights in the format Accelerate understands, then load the model with Accelerate. After that, all the rest of the model code worked without any changes.

But it is incredibly slow. A 20 billion parameter model took about a half hour to respond to a prompt and generate 100 tokens. A 175 billion parameter model like Facebook's would probably take hours.

1: https://huggingface.co/docs/accelerate/big_modeling

Thank you for the pointer. I've been poking at it with a fork for the past few hours, and realized I forgot to respond.