|
|
|
|
|
by nmitchko
1259 days ago
|
|
You can split the model across devices with huggingface accelerate library. Check out the infer_auto_memory_map metho which will optimize the model for your configuration (multi gpu, ram, nvme) and then run dispatch model on with that memory map. |
|