Hacker News new | ask | show | jobs
by minimaxir 1143 days ago
DeepFloyd IF is effectively the same architecture/text encoder as Imagen (https://imagen.research.google/), although that paper doesn't hypothesize why text works out a lot better.
1 comments

Right, I'm aware of the Imagen architecture, just curious to see further research determining which aspect of it is responsible for the improved text rendering.

EDIT: According to the figure in the Imagen paper FL33TW00D's response referred me to, it looks like the text encoder size is the biggest factor in the improved model performance all-around.