| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by p1esk 2818 days ago
	Model parallelism is also useful in situation where your model (and/or your inputs) is so large that even with batch_size=1 it does not fit in GPU memory (especially if you're still using 1080Ti). However other techniques might help here (e.g. gradient checkpointing, or dropping parts of your graph to INT8).