|
|
|
|
|
by rishabhaiover
181 days ago
|
|
I did an experiment on FlashAttention in Triton to measure the impact of caching tiles in the Shared Memory. Surprisingly, it had a non-monotonic relationship with prefetching these tiles and it was kernel dependent. Attention kernel benefits from prefetching caches while MLP W1 doesn't. |
|