Hacker News new | ask | show | jobs
by chillee 683 days ago
Hi, one of the authors of this blog post (Horace He), along with Driss Guessous, Yanbo Liang, and Joy Dong.

We’re quite happy with this abstraction - happy to answer any questions about it!

2 comments

For those of us using the 2D NATTEN kernel from their library along with torch.compile, is this faster? Especially given all their tricks (e.g., the non-deterministic KV-parallelism)
In my (very amateurish) testing, I think the performance seemed pretty comparable (for non-dilated natten). I need to do some proper benchmarking though!
Is this for Ampere and newer only as FA2?
I believe it should run on V100 as well (although definitely not tested as well), and an user reported that they got it running on T4 too.