|
|
|
|
|
by the_panopticon
258 days ago
|
|
Very interesting. It sounds like tuning at the PTX level can increase workload efficiencies, such as quote "Specifically, we employ customized PTX (Parallel Thread Execution) instructions" from the DeepSeek folks https://arxiv.org/abs/2412.19437. |
|