| HN Mirror

That would indeed be an interesting thing to try, use real data, but only in terms of textures - so effects like occlusions, perspective, etc. would not be present.

I would expect it to be somewhere in the ballpark of our StyleGAN images, which also look very "textural", but lack these effects that are an result of imaging the 3D world. Interestingly, modelling these effects without realistic textures seems to result in worse performance - this is for example the case for images taken from CLEVR or generated from Minecraft, and both perform worse than the StyleGAN images.