Hacker News new | ask | show | jobs
by poolio 1357 days ago
I think you're both right! It is incredible that the 2D model knows enough about the visual world to produce many objects from all angles, but the 3D model is essential for gluing these views together, and in some ways can fill in the gaps the 2D model doesn't know about. Imagine just taking a huge collection of photographs of an object. While there is enough information in those photos to reconstruct the 3D object, I wouldn't personally call that collection of images "an understanding of 3D." In our case, the diffusion model is the collection of photos and the NeRF model + optimization procedure is what figures out how all those photos can be related to a shared underlying 3D representation. - ben p (author)