Hacker News new | ask | show | jobs
by patresh 1337 days ago
If you need larger batch sizes but don't have the VRAM for it, have a look at gradient accumulation (https://kozodoi.me/python/deep%20learning/pytorch/tutorial/2...).

You can accumulate the gradients of multiple batches before doing the weight update step. This allows you to run effectively much larger batch sizes than your GPU would allow without it.

1 comments

Yep, this is a very valid point and I need to look more into this... which means rebuilding a lot of my toolchain but I think it would ultimately be worth the time investment!