|
|
|
|
|
by ozinenko
3042 days ago
|
|
The crucial part is the polyhedral optimizer which does indeed include several GPU-specific heuristics (multilevel parallelization, coalescing, etc) and specialization to tensor sizes. Evolutionary autotuner is used to tweak the parameters of the optimizer. As a result, TC can beat cublas and cudnn on certain networks; details in the report. |
|