| HN Mirror

No, NeRFs are more interpretable because they directly model field densities which absorb and emit light. In this respect they are something akin to a neural version of photogrammetry. They don’t need to train on a large corpus of images, because they can reconstruct directly from a collection of posed images.

On the other hand, diffusion models can learn fairly arbitrary distributions of signals, so by exploiting this learned prior together with view consistency, they can be much more sample efficient than ordinary NeRFs. Without learning such a prior, 3D reconstruction from a single image is extremely ill-posed (much like monocular depth estimation).