Hacker News new | ask | show | jobs
by sadpasture 1252 days ago
I think it has to do with text being much more precise. Your stably diffused cartoon avatar having 6 finger is not nearly as noticeable as a language model's chat mispelling every second word. So you need less resources to get to a human acceptable result
1 comments

no, diffusion models are just more efficient