At Manifest AI we have just released our open-source CUDA kernels to implement Symmetric Power Transformers, as described in our paper from back in August:
Since this is a variant of a linear attention, you get linear cost when training (as opposed to quadratic in regular attention), and constant when doing inference. This is especially attractive for longer contexts!
Have a look and play with it -- and of course contributions are very welcome! It's an early alpha!
https://manifestai.com/articles/symmetric-power-transformers...
Since this is a variant of a linear attention, you get linear cost when training (as opposed to quadratic in regular attention), and constant when doing inference. This is especially attractive for longer contexts!
Have a look and play with it -- and of course contributions are very welcome! It's an early alpha!