|
|
|
|
|
by davmre
518 days ago
|
|
They're not proposing to apply tensor decomposition to an existing collection of weights. It's an architecture in which the K, V, and Q tensors are constructed as a product of factors. The model works with the factors directly and you just need to compute their product on the forward pass (and adjoints on the backwards pass), so there's no decomposition. |
|