Hacker News new | ask | show | jobs
by Blammar 940 days ago
The thing that is truly mindboggling to me is that THE SHADOWS IN THE IMAGES ARE CORRECT. How is that possible??? Does DALL-E actually have a shadow-tracing component?
4 comments

Research into the internals of the networks have shown that they figure out the correct 2.5D representation of the scene before the RGB textures (internally), so yes it seems they have an internal representation of the scene and therefore can do enough inference from that to make shadows and light seem natural.

I guess it's not that far-fetched as your brain has to do the same to figure out if a scene (or an AI-generated one for that matter) has some weird issue that should pop out. So in a sense your brain does this too.

Interesting! Do you have a link to that research?
Certainly: https://arxiv.org/abs/2306.05720

It's a very interesting paper.

"Even when trained purely on images without explicit depth information, they typically output coherent pictures of 3D scenes. In this work, we investigate a basic interpretability question: does an LDM create and use an internal representation of simple scene geometry? Using linear probes, we find evidence that the internal activations of the LDM encode linear representations of both 3D depth data and a salient-object / background distinction. These representations appear surprisingly early in the denoising process−well before a human can easily make sense of the noisy images."

What does 2.5D mean?
You usually say 2.5D when it's a 3D but only from a single vantage point with no info of the back-facing side of objects. Like the representation you get from a depth-sensor on a mobile phone, or when trying to extract depth from a single photo.
It means you should be worried about the guy she told you not to worry about
I randomly checked a few links here and shadows were correct in 2 images out of a dozen... and any people tend to be horrifying in many
Stable diffusion does decent reflections too
Yes! It can also get reflections and refractions mostly correct.