|
|
|
|
|
by pclmulqdq
1392 days ago
|
|
I'm not sure I would call the architecture very complex. It's about as simple as you can make a scale-out supercomputer. I assume they essentially do static positioning of the cluster for training jobs, and have a translation layer from the TensorFlow middle-end to their thing. Google did a similar thing with their TPUs, so it makes sense that they would have architected TF to accept exotic supercomputers as backends. |
|
It still requires many experts, both to write the XLA to hardware translation, and ML engineers who know how to write TF python that executes quickly.
(note: Google has transitioned many projects to Jax, which also writes to XLA, as TF ended up being a bit of a pig with wings)