| HN Mirror

In general, you try to keep the TPU/GPU busy 100%, so enough data needs to be readily accessible at any point in time. In this example, images needs to be read from disk, decoded, transformed (cropped, resized, normalized etc.) before they can be fed to the TPU. The transformations can be computationally intensive so they actually become a bottleneck.

In terms of how much compute power the TPU pre-processing needs I only have very rough numbers: I ran the same pre-processing while training ResNet-50 on a node with 4 GPUs and it was consistently utilizing >22 CPU cores (including all of the other CPU-tasks while training).