|
|
|
|
|
by kristjansson
508 days ago
|
|
Totally, they did great work under their constraints. Training in FP8, the MLA thing they introduce in DeepSeek-V2, etc. I just take particular issue with the attention the PTX thing is getting because (a) it's not like other labs don't do stuff like that and (b) it doesn't contribute nearly as much to their outcome as the other algorithmic and operational improvements they've made. |
|