It is possible because training is already being done using distributed methods in the datacenter. It is not practical because of the sheer volume of network delay piled on top of the computation