|
|
|
|
|
by p1esk
2818 days ago
|
|
Model parallelism is also useful in situation where your model (and/or your inputs) is so large that even with batch_size=1 it does not fit in GPU memory (especially if you're still using 1080Ti). However other techniques might help here (e.g. gradient checkpointing, or dropping parts of your graph to INT8). |
|