Hacker News new | ask | show | jobs
by haldujai 1262 days ago
If you can’t fit the model on your resources you can leverage DeepSpeed’s ZeRO-offload which will let you train GPT2 on a single V100 (32gb).

Alternatively, if you’re researching (with the caveat that you have to either publish, open source or share your results in a blog post) you can also get access to Google’s TPU research cloud which gives you a few v3-8s for 30 days (can’t do distributed training on devices but can run workloads in parallel). You can also ask nicely for a pod, I’ve been granted access to a v3-32 for 14 days pretty trivially which (if optimized) has more throughput than 8xA100 on transformer models.

TPUs and moreso pods are a bit harder to work with and TF performs far better than PyTorch on them.

https://www.deepspeed.ai/tutorials/zero-offload/

https://medium.com/analytics-vidhya/googles-tpu-research-clo...