|
|
|
|
|
by cameldrv
2819 days ago
|
|
I think that the reason no one implements Krizhevsky's OWT (at least in normal training scripts, there's nothing stopping you from doing this in TensorFlow) is that the model parallelism in OWT is only useful where you have more weights than inputs/outputs to a layer. This was true for the FC layers in AlexNet, but hardly anyone uses large FC layers anymore. |
|