|
|
|
|
|
by cs702
1000 days ago
|
|
Thank you for posting on HN and clarifying that. I was so off-the-mark! I'm used to the convention in most deep-learning papers of using N for time steps and D for number of dimensions. Your work looks more interesting to me now, even though cubic time and quadratic space in the number of dimensions are still a significant drag. Consider: State-of-the-art models often work with dimensions 2-3 orders of magnitude greater than 64. For example, LLaMA 2 models operate on visible and hidden states on 4096 and 11008 dimensions, respectively. Anyway, thank you again! I'm adding your paper to my reading list. |
|
I agree with the cubic time and quadratic space is a big limitation for now and I'm looking for ways to make them linear (or close to linear).