| HN Mirror

I guess that what I'm saying is I'd love to see an LLM actually have it's attention mechanism replaced with this and get benchmarked on real world tasks in comparison to quadratic attention. They don't seem to have done that here. They claim that's it's close to being the same, but my experience tells me that it needs to do better than get "pretty close."

They also haven't' tried to write a high performance kernel for triton yet. If it goes the way my last experiment with Taylor did they're in for some bad news.

I'm just a hobbyist though, it's certainly possible that people with more time/resources could outperform me without much effort. I just want to see it tested on something familiar and benchmark-able.