Hacker News new | ask | show | jobs
by cs702 1000 days ago
Thank you for posting on HN and clarifying that. I was so off-the-mark! I'm used to the convention in most deep-learning papers of using N for time steps and D for number of dimensions.

Your work looks more interesting to me now, even though cubic time and quadratic space in the number of dimensions are still a significant drag.

Consider: State-of-the-art models often work with dimensions 2-3 orders of magnitude greater than 64. For example, LLaMA 2 models operate on visible and hidden states on 4096 and 11008 dimensions, respectively.

Anyway, thank you again! I'm adding your paper to my reading list.

1 comments

Yes, sorry for that. I'm coming from physics, so L for sequence length and n for the number of dimensions make more sense :D

I agree with the cubic time and quadratic space is a big limitation for now and I'm looking for ways to make them linear (or close to linear).

> I'm coming from physics

I figured as much :-) because using n and m for vector and linear map dimensions is actually the older, more established convention.

> I'm looking for ways to make them linear (or close to linear)

The holy grail in AI research right now. I imagine you're looking, or have looked, at mapping models to frequency space to make complexity O(n log n). Take a look at the work the Hazy Research folks have been doing at Stanford -- if you haven't already.