Hacker News new | ask | show | jobs
Speed Is All You Need: On-Device Acceleration of Large Diffusion Models (arxiv.org)
56 points by Pelayu 1142 days ago
3 comments

Interestingly these are OpenCL kernels so in theory some of the optimizations might run out-of-the-box on CPUs.

It would be instructive to compare their speedups on the iPhone to the Apple CoreML implementation: https://github.com/apple/ml-stable-diffusion

This incredible, can't wait to run it. Is there a code sample somewhere to reproduce their Samsung s23 results?
This is definitely a welcome development, but I'm getting so tired of all these papers trying to pay homage to the original Transformer paper in their title. It is neither funny anymore, nor does it give due credit or indicate quality and on top of that the original paper title was a pretty poor choice in hindsight, highlighting how the original authors didn't foresee the gigantic impact of their paper.
Why do you think the original paper title was a poor choice? It very much highlights the main idea, the main aspect which is studied in this paper.

The paper title is "Attention is all you need", for those who don't know.

And attention at that point in time was already very well known and part of the standard translation model. But all those attention-based encoder-decoder models where using LSTMs, or maybe CNNs. Self-attention was also already known at that point, although still rarely used. So the novelty was the study on whether a model where you remove almost everything else, except of attention, whether this still works.

Such study was on the one side just interesting in itself. But then, such model also had some advantages like faster training. In the next few years, the faster training was actually the main advantage over LSTM-based models. For a long time, it was never really clear whether a Transformer is really better than a LSTM-based model when trained the same number of epochs. In most comparisons, Transformer were simply trained much more epochs.

I'm well aware of the research that led to it. I was already working in the field back then and I remember that the community was far from realizing how monumental this paper would end up being. Otherwise the authors probably would have considered a more informative or at least less ambiguous title. It also didn't help that the architecture they described (encoder-decoder) was actually even more complicated than what we have now in GPT and the likes. And the really important thing was not that it could train more epochs than recurrent architectures (although that certainly helped the huge models that came later), but it could drastically extend context length for sequence tasks. They went from a theoretically infinite (but in practice very limited) context length to a fundamentally limited but practically obtainable one.
I am not sure it’s not funny. Elon Musk gave ChatGPT $100 million dollars. There are 9 billion people in the world… he could have made everyone a millionaire many times over! (In SHIB.) I feel like that amount of ShibaCoin would be life changing for most people. Yet he wasted it all on a company that became for-profit and sold shares to Microsoft instead.

(And no, before you say it, my math checks out!)

You must have a PhD in math because it all checks out. No errors :P