Y
Hacker News
new
|
ask
|
show
|
jobs
by
diwank
774 days ago
Seconded. I’m guessing you could create an implementation that is able to do that and then write optimised triton/cuda kernels to accelerate them but need to investigate further