|
|
|
|
|
by farhanhubble
189 days ago
|
|
It might we worth it to use that subset to initialize the weights of future models but more importantly you could save a huge number of computational cycles by using the lower dimensional weights at the time of inference. |
|
Key point being: the parameters might be picked off a lower dimensional manifold (in weight space), but this doesn't imply that lower-rank activation space operators will be found. So translation to inference-time isn't clear.