| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nmitchko 1306 days ago
	You can split the model across devices with huggingface accelerate library. Check out the infer_auto_memory_map metho which will optimize the model for your configuration (multi gpu, ram, nvme) and then run dispatch model on with that memory map.