Hacker News new | ask | show | jobs
by paolatauru 61 days ago
solid port. the sdpa swap for sparse attention — did you notice a meaningful quality difference, or is it basically equivalent to the cuda version? curious if the pure-pytorch path added any noticeable latency hit on the m3 max