Hacker News new | ask | show | jobs
by ozinenko 3042 days ago
The crucial part is the polyhedral optimizer which does indeed include several GPU-specific heuristics (multilevel parallelization, coalescing, etc) and specialization to tensor sizes. Evolutionary autotuner is used to tweak the parameters of the optimizer. As a result, TC can beat cublas and cudnn on certain networks; details in the report.
1 comments

What would be a relationship between TC and something like CuPy?
CuPy itself is just a framework, and you could slot TC in as a thing that generates operators for it. CuPy also famously has support for inline CUDA kernels; the equivalent TC kernels are shorter and autotunable.