Why is it inference only? At least the operations are the same...just a bunch of linear algebra
Inference also prefers different IO patterns, because you don't need to keep the activations for every layer ready for backpropogation.
Inference also prefers different IO patterns, because you don't need to keep the activations for every layer ready for backpropogation.