Y
Hacker News
new
|
ask
|
show
|
jobs
by
minimaxir
670 days ago
The diffusion process is conditioned on CLIP text, which works better (in theory) since the encoded text is aligned with images.