|
|
|
|
|
by haldujai
1262 days ago
|
|
If you can’t fit the model on your resources you can leverage DeepSpeed’s ZeRO-offload which will let you train GPT2 on a single V100 (32gb). Alternatively, if you’re researching (with the caveat that you have to either publish, open source or share your results in a blog post) you can also get access to Google’s TPU research cloud which gives you a few v3-8s for 30 days (can’t do distributed training on devices but can run workloads in parallel). You can also ask nicely for a pod, I’ve been granted access to a v3-32 for 14 days pretty trivially which (if optimized) has more throughput than 8xA100 on transformer models. TPUs and moreso pods are a bit harder to work with and TF performs far better than PyTorch on them. https://www.deepspeed.ai/tutorials/zero-offload/ https://medium.com/analytics-vidhya/googles-tpu-research-clo... |
|