Hacker News new | ask | show | jobs
by some1else 624 days ago
"Allegory of the cave" comes to mind, when trying to describe the understanding that's missing from diffusion models. I think a super-model with such qualifications would require a number of ControlNets in a non-visual domains to be able to encode understanding of the underlying physics. Diffusion models can render permutations of whatever they've seen fairly well without that, though.
1 comments

I'm very familiar with the allegory of the cave, but I'm not sure I understand where you're going with the analogy here.

Are you saying that it is not possible to learn about dynamics in a higher dimensional space from a lower dimensional projection? This is clearly not true in general.

E.g., video models learn that even though they're only ever seeing and outputting 2d data, objects have different sides in a fashio that is consistent with our 3d reality.

The distinctions you (and others in this thread) are making is purely one of degree - how much generalization has been achieved, and how well - versus one of category.