Hacker News new | ask | show | jobs
by sharemywin 1153 days ago
I didn't see anything in the article about what the scaling factor was? less than P^2 but what was it?
2 comments

The paper has a “preliminary scaling law” diagram. The shape of the graph is the same, but with 20% fewer FLOPS.

The real breakthrough is that Hyena apparently has an unlimited context window.

>The real breakthrough is that Hyena apparently has an unlimited context window.

It's extrapolated volition time (゚∀゚)

Is it? Removing the context window limit is big, no doubt, but inference still takes time (and compute).
I think GP is talking about the ability of an AI to make decisions with reference to context from the past and therefore have a “will extended over time”
presumably still O(n2) in theory, but not for practical cases.

I think that anything reolacing attention will suffer quadratic growth for some pathological examples.

maybe if we have a better understanding of the data we could give a better definition (much like graph complexity is usually given in the actual number of edges, which are theoretically O(n2).)