Hacker News new | ask | show | jobs
by ramblerman 1255 days ago
It's interesting that the requirements for a text model are so much greater than for images.

Stable diffusion can run on a home pc, while it seems you need a super computer for GPT3. I'm not sure that would have been my intuition.

1 comments

I think it has to do with text being much more precise. Your stably diffused cartoon avatar having 6 finger is not nearly as noticeable as a language model's chat mispelling every second word. So you need less resources to get to a human acceptable result
no, diffusion models are just more efficient