|
|
|
|
|
by mfkasim1
1000 days ago
|
|
The author of the paper here. The cubic time and quadratic space complexity is with respect to the number of dimensions (n), not the sequential time steps. The time and space complexity w.r.t. the number of time steps (L) are both linear, specifically O(Ln^3) time and O(Ln^2) space. The method gives larger speed up with longer sequence (>1k time steps) but with relatively small number of dimensions (<= 64 dimensions from our experiments). |
|
Your work looks more interesting to me now, even though cubic time and quadratic space in the number of dimensions are still a significant drag.
Consider: State-of-the-art models often work with dimensions 2-3 orders of magnitude greater than 64. For example, LLaMA 2 models operate on visible and hidden states on 4096 and 11008 dimensions, respectively.
Anyway, thank you again! I'm adding your paper to my reading list.