| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dheera 865 days ago

I've done a bunch of experiments on my own on the Stable Diffusion VAE.

Even when going down to 4-6 bits per latent space pixel the results are surprisingly good.

It's also interesting what happens if you ablate individual channels; ablating channel 0 results in faithful color but shitty edges, ablating channel 2 results in shitty color but good edges, etc.

The one thing it fails catastrophically on though is small text in images. The Stable Diffusion VAE is not designed to represent text faithfully. (It's possible to train a VAE that does slightly better at this, though.)

1 comments

3abiton 865 days ago

How does the type of image (Anime, vs Photo realistic, vs Painting vs etc .m) affect the compression results? Is there a noticable difference?

link

dheera 865 days ago

I haven't noticed much difference between these. They're all well-represented in the VAE training set.

link