|
|
|
|
|
by refulgentis
708 days ago
|
|
Clarifying: Given the question: "How much is the flash attention algorithm tied to the hardware?" The answer is 0. ex. you can find generic flash attention recently added in llama.cpp and ONNX (MS needed it for Phi-3, needed for Recall). On the side, novelty, I have no direct knowledge on, IMHO, asking that question would devolve the way novelty arguments do in any field: there's always someone else who can claim they did 80% of $X via $X-1, therefore, $X is by and large not novel. Ad infinitum. |
|
I definitely don't mean to take away from Tri/FA by mentioning novelty - I'm just repeating from paper, which refers back to algebraic aggregates[0] in its discussion of their tiled softmax.
[0]: https://web.stanford.edu/class/cs345d-01/rl/olap.pdf