|
|
|
|
|
by fgfm
945 days ago
|
|
This approach feels like pruning, but the speedup is considerably higher. Interestingly, I'm curious how this will play out on more recent transformer architectures though: I guess the speedup will be more important for the largest architectures, but even if we can get 2x or 10x speedup on Mistral/Zephyr, Orca 2 or OpenChat3.5, that would be a tremendous achievement! |
|