Hacker News new | ask | show | jobs
by diwank 774 days ago
Seconded. I’m guessing you could create an implementation that is able to do that and then write optimised triton/cuda kernels to accelerate them but need to investigate further