|
|
|
|
|
by sdenton4
1163 days ago
|
|
That's the point of having multiple realizations of the same underlying model. The (depthwise) convolutional realization is extremely efficient for training, and the RNN is extremely efficient for inference. The scaling in both of these cases is much better than attention layers - as they discuss in the article. |
|