Hacker News new | ask | show | jobs
by throwitaway6512 2685 days ago
For all of these “this is a simulated face”-claims I wish they would show the 2-3 most similar faces in their training set. For all you know, it could just be spitting out a random training image. How would you know the difference?
3 comments

take a look at "Figure 8" from the original paper:

https://imgur.com/a/rZsWzDa

we can smoothly interpolate between faces, so it seems impossible to me that these are just memorised from the training set

I just realized that the eyes, nose, and mouth are always in the same place in these images... even though the head might rotate around them.
That might be because they cleaned the dataset thoroughly. I vaguely recall there was something about 'facial landmarks' and alignment in the ProGAN work which was presumably carried over to StyleGAN. Doubtless helps the final quality.

However, aligned faces are definitely not required - I didn't do any kind of alignment for my anime faces and you can see the eyes/nose/mouth in all sorts of positions in the samples & videos.

This is usually done, most GAN paper show in the appendix a list of generated images with distance to images in the dataset, for example check from pages 14 to 16 in this GAN paper [1]. Note that measuring distances between images is not trivial and some measurement space must be chosen, typically cosine at the last ResNet feature map.

[1] https://openreview.net/pdf?id=B1xsqj09Fm

you can download the training set here https://github.com/NVlabs/stylegan#resources it's based on 70k high resolution flickr images. try interpolation on the colab link above so you can be convinced its capturing the features rather than just memorizing