Hacker News new | ask | show | jobs
by scv119 1098 days ago
I believe you can slightly change the flash attention kernel to implement the same kernel of this page attention, since both of them work on the key/value cache at block level.