Any plans on message passing / clustering with this, so that multiple hardware nodes could be connected to the same computational graph over the network?
Assuming also that this has support for AVX-512 acceleration?
We have an allreduce op in nGraph to support data parallel using OpenMPI, and are investigating the best approaches to integrate with frameworks. We also plan to add more collective communication ops to support model parallel in future.