Hacker News new | ask | show | jobs
by Jabbles 2976 days ago
Do you have more information about this bit?

the TPU implementation applies very compute-intensive image pre-processing steps and actually sacrifices raw throughput

Thanks

1 comments

In general, you try to keep the TPU/GPU busy 100%, so enough data needs to be readily accessible at any point in time. In this example, images needs to be read from disk, decoded, transformed (cropped, resized, normalized etc.) before they can be fed to the TPU. The transformations can be computationally intensive so they actually become a bottleneck.

In terms of how much compute power the TPU pre-processing needs I only have very rough numbers: I ran the same pre-processing while training ResNet-50 on a node with 4 GPUs and it was consistently utilizing >22 CPU cores (including all of the other CPU-tasks while training).