Hacker News new | ask | show | jobs
by kristjansson 508 days ago
Totally, they did great work under their constraints. Training in FP8, the MLA thing they introduce in DeepSeek-V2, etc. I just take particular issue with the attention the PTX thing is getting because (a) it's not like other labs don't do stuff like that and (b) it doesn't contribute nearly as much to their outcome as the other algorithmic and operational improvements they've made.