|
|
|
|
|
by boroboro4
285 days ago
|
|
But python is already operating fully on different level of abstraction - you mention triton yourself, and there is new python cuda api too (the one similar to triton). More to this - flash attention 4 is actually written in python. Somehow python managed to be both high level and low level language for GPUs… |
|
Also, flash attention is at v3-beta right now? [0] And it requires one of CUDA/Triton/ROCm?
[0] https://github.com/Dao-AILab/flash-attention
But maybe I'm out of the loop? Where do you see that flash attention 4 is written in Python?