|
|
|
|
|
by paolatauru
61 days ago
|
|
solid port. the sdpa swap for sparse attention — did you notice a meaningful quality difference, or is it basically equivalent to the cuda version? curious if the pure-pytorch path added any noticeable latency hit on the m3 max |
|