Hacker News new | ask | show | jobs
by yyding 957 days ago
Good job! I observed that you implemented many cuda kernels by yourselves. Just wondering your consideration or trade-off between implementating the kernels via pure CUDA code vs. implementing based on compiler like TVM/Triton.
1 comments

Good question, in general implementing kernels on page tables is tricky in Tensor Compilers because integer set analysis might fail sometimes (but can be fixed with some tweaks). I think using compilers like TVM can help deploy serving systems on different platforms (e.g. AMD GPUs) and I'm optimistic about this direction (and we have to make Tensor Compilers more user-friendly).