| HN Mirror

In the previous Cloud TPU architecture, PyTorch and JAX users had to create a separate CPU VM for every remote TPU host and arrange for these CPU hosts to communicate indirectly with the TPU hosts via gRPC. This was cumbersome and made debugging difficult.

With TPU VMs, none of this is necessary. You can SSH directly into each TPU host machine and install arbitrary software on a VM there to handle data loading and other tasks with much greater flexibility.

The blog post provides an example of training cost improvement using PyTorch / XLA on TPU VMs in the "Local execution of input pipeline" section. Hopefully we will be able to provide more tutorials on using PyTorch / XLA with TPU VMs soon.

With TPU VMs, workloads that require lots of CPU-TPU communication can now do that communication locally instead of going over the network, which can improve performance.