Hacker News new | ask | show | jobs
by dheera 817 days ago
I've done a bunch of experiments on my own on the Stable Diffusion VAE.

Even when going down to 4-6 bits per latent space pixel the results are surprisingly good.

It's also interesting what happens if you ablate individual channels; ablating channel 0 results in faithful color but shitty edges, ablating channel 2 results in shitty color but good edges, etc.

The one thing it fails catastrophically on though is small text in images. The Stable Diffusion VAE is not designed to represent text faithfully. (It's possible to train a VAE that does slightly better at this, though.)

1 comments

How does the type of image (Anime, vs Photo realistic, vs Painting vs etc .m) affect the compression results? Is there a noticable difference?
I haven't noticed much difference between these. They're all well-represented in the VAE training set.