It's trained with a mixture of publicly available datasets of faces. The final model is several gigabyes in size, so it's fairly large. Actually that's one of the reasons we've made this tool - to test our infrastructure with larger generative models.
The model is learning the features that make a convincing face, and generating a synthetic face from those (controlled by the segment map), similar to Nvidia's GauGAN.
The power in generative models lies in being able to flexibly generate images that belong convincingly to a set (e.g. faces, landscapes), but that are not actually in the input dataset. E.g. you can make images that look like faces, but that don't belong to any real individual.
The model is learning the features that make a convincing face, and generating a synthetic face from those (controlled by the segment map), similar to Nvidia's GauGAN.