| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by erwannmillon 1048 days ago

Btw, I did this in pixel space for simplicity, cool animations, and compute costs. Would be really interesting to do this as an LDM (though of course you can't really do the LAB color space thing, unless you maybe train an AE specifically for that color space. )

I was really interested in how color was represented in latent space and ran some experiments with VQGAN clip. You can actually do a (not great) colorization of an image by encoding it w/ VQGAN, and using a prompt like "a colorful image of a woman".

Would be fun to experiment with if anyone wants to try, would love to see any results if someone wants to build

2 comments

carbocation 1048 days ago

> I did this in pixel space for simplicity, cool animations, and compute costs

A slight nitpick, wouldn't doing diffusion in the latent space be cheaper?

link

erwannmillon 1048 days ago

Depends, given the low res, the 3x64x64 pixel space image is smaller than the latents you would get from encoding a higher-res image with models like VQGAN or the stablediff VAE at their native resolutions.

It's easier to get a sense of what's going wrong with a pixel space model though. With latent space, there's always the question of how color is represented in latent space / how entangled it is with other structure / semantics.

Starting in pixel space removed a lot of variables from the equation, but latent diffusion is the obvious next step

link

ShamelessC 1048 days ago

Not necessarily if you don’t already have a pretrained autoencoder.

link

xigency 1048 days ago

Question, how long did it take to train this model and what hardware did you use?

link

erwannmillon 1048 days ago

Took a lot of failed experiments, the model would keep converging to greyscale / sepia images. Think one of the ways I fixed was by adding an greyscale encoder to the arch. Used its output embedding as additional conditioning. Can't remember if I only added it to the Unet input or injected it during various stages of the unet down pass.

link

erwannmillon 1048 days ago

Think the final training run was only a couple hours on a Colab V100

link