pitch basically boils down to 'just change one line and it works' which sounds too good to be true, but if they actually pull it off at 100k-chip scale, that's genuinely a big deal
The pitch is "just change one line and it works". It is not "just change one line and you will get peak performance on the TPU".
From the text of the blog post:
"Portability doesn't eliminate hardware realities, so TorchTPU facilitates a tiered workflow: establish correct execution first, then use our upcoming deep-dive guidelines to identify and refactor suboptimal architectures, or to inject custom kernels, for optimal hardware utilization."
From the text of the blog post: "Portability doesn't eliminate hardware realities, so TorchTPU facilitates a tiered workflow: establish correct execution first, then use our upcoming deep-dive guidelines to identify and refactor suboptimal architectures, or to inject custom kernels, for optimal hardware utilization."