Hacker News new | ask | show | jobs
by kken 810 days ago
Generally, the VAE is mapping from a small latent space to a large image space. This means that there must be a large number of images for which no reverse mapping exists.

It should be possible to identify images that have not been generate by the VAE since they are not part of the set images that the VAE can generate. The other way round is a bit more difficult as there may be images that can be mapped to the latent space and back without loss but have been generated in another way

-> there may be false positives.

1 comments

This logic has a key flaw: just the fact that the size of the space is different doesnt mean that every representable thing in the larger space is a thing we care about. E.g. a person with three hands may not have a representation in the smaller space, but we would never care about that. The actual question is: what is the difference in the amount of information encoded in a large image vs the small latent space and compare that to the difference in information between a large image and a small image. If those two differences are close enough together, being able to determine a legitimate difference between SD generated vs not becomes near impossible.
The logic is still the same. If the VAE is trained so that it is biased toward human preference, then the probability of false positives in real world images would increase.
Yes, otherwise cryptographic hashes won't work (they are not bijective)