Hacker News new | ask | show | jobs
by fgfm 945 days ago
This approach feels like pruning, but the speedup is considerably higher. Interestingly, I'm curious how this will play out on more recent transformer architectures though: I guess the speedup will be more important for the largest architectures, but even if we can get 2x or 10x speedup on Mistral/Zephyr, Orca 2 or OpenChat3.5, that would be a tremendous achievement!
1 comments

I'm curious as to how applicable this approach might be for text-to-image models like Stable Diffusion.