| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kristjansson 548 days ago
	To be sure, DeepSeek did great work, and this is a bit aside from TFA. But the PTX thing is a bit of meme? What do we think torch.compile and triton and llvm's nvptx backend are doing under the hood? The warp-specialization thing quoted in [1] cites to a _2014_ paper[2] out of Stanford ... [2]: https://dl.acm.org/doi/10.1145/2555243.2555258

1 comments

rcarmo 548 days ago

Yeah, well, it's not _just_ PTX. Think about what you would do if you had to work in a resource-constrained system (that's a mindset I closely relate to since I still do C++ for MCUs, and it makes you dig _under_ the libraries to save resources).

link

kristjansson 548 days ago

Totally, they did great work under their constraints. Training in FP8, the MLA thing they introduce in DeepSeek-V2, etc. I just take particular issue with the attention the PTX thing is getting because (a) it's not like other labs don't do stuff like that and (b) it doesn't contribute nearly as much to their outcome as the other algorithmic and operational improvements they've made.

link