|
|
|
|
|
by zak
2354 days ago
|
|
(I'm one of the Cloud TPU product leads) We've seen multiple BERT-related PyTorch models training successfully on Cloud TPUs, including training at scale on large, distributed Cloud TPU Pod slices. Would you consider filing a GitHub issue at https://github.com/pytorch/xla or emailing pytorch-tpu@googlegroups.com to provide a bit more context about the specific issue you encountered? Here's the current PyTorch/TPU troubleshooting guide, which provides information on how to collect and interpret metrics that are very helpful for debugging:
https://github.com/pytorch/xla/blob/master/TROUBLESHOOTING.m... Thanks! |
|
How do you see it? Do you look at your client's code?