|
|
|
|
|
by ozinenko
3046 days ago
|
|
Section 7 of the paper (https://arxiv.org/abs/1802.04730) has a couple of examples. In short, yes CuDNN is fast for the cases it was tuned for. It is probably faster on power-of-two sizes, but when you operate on a 26 x 1024954 x 3 tensor, TC can generate specialized code. Want 42 x 17 x 5? TC can generate differently specialized code. With almost no effort from the user (or performance engineers). Can a performance expert do better job than TC optimizer? Very likely yes, but it will very likely take much more time. TC is not a framework. It can be integrated with any framework of your liking. |
|