Y
Hacker News
new
|
ask
|
show
|
jobs
by
ipsum2
1100 days ago
The ideas are orthogonal, and can be used (theoretically) at the same time.
1 comments
scv119
1100 days ago
I believe you can slightly change the flash attention kernel to implement the same kernel of this page attention, since both of them work on the key/value cache at block level.
link