| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by patresh 1384 days ago
	If you need larger batch sizes but don't have the VRAM for it, have a look at gradient accumulation (https://kozodoi.me/python/deep%20learning/pytorch/tutorial/2...). You can accumulate the gradients of multiple batches before doing the weight update step. This allows you to run effectively much larger batch sizes than your GPU would allow without it.

1 comments

stephanst 1384 days ago

Yep, this is a very valid point and I need to look more into this... which means rebuilding a lot of my toolchain but I think it would ultimately be worth the time investment!

link