Hacker News new | ask | show | jobs
by kristjansson 501 days ago
To be sure, DeepSeek did great work, and this is a bit aside from TFA. But the PTX thing is a bit of meme? What do we think torch.compile and triton and llvm's nvptx backend are doing under the hood? The warp-specialization thing quoted in [1] cites to a _2014_ paper[2] out of Stanford ...

[2]: https://dl.acm.org/doi/10.1145/2555243.2555258

1 comments

Yeah, well, it's not _just_ PTX. Think about what you would do if you had to work in a resource-constrained system (that's a mindset I closely relate to since I still do C++ for MCUs, and it makes you dig _under_ the libraries to save resources).
Totally, they did great work under their constraints. Training in FP8, the MLA thing they introduce in DeepSeek-V2, etc. I just take particular issue with the attention the PTX thing is getting because (a) it's not like other labs don't do stuff like that and (b) it doesn't contribute nearly as much to their outcome as the other algorithmic and operational improvements they've made.