|
|
|
|
|
by kristjansson
501 days ago
|
|
To be sure, DeepSeek did great work, and this is a bit aside from TFA. But the PTX thing is a bit of meme? What do we think torch.compile and triton and llvm's nvptx backend are doing under the hood? The warp-specialization thing quoted in [1] cites to a _2014_ paper[2] out of Stanford ... [2]: https://dl.acm.org/doi/10.1145/2555243.2555258 |
|