|
|
|
|
|
by Hnus
1415 days ago
|
|
> enables conditional generation of 3D scenes from different modalities like text or RGB images. Please help me understand few dumb questions I have. - What exactly is used as an input to generate such scenes is it just few pictures or even text description? - Is it able to generate data for something which was not in the input? Like you have some common object in the corner of your photo and its able to expand the picture as if you had it in the frame in the first place? - What is the end game of technologies like these? Could it be one day fed lets say every piece of data google has about the world like every 360 picture, every book, article, video, movie and so on allowing you to take picture of something and spawning infinitely walkable world looking and behaving as our reality? Similar to procedurally generated video game map. |
|
i dont think so? it just reconstructs the space it sees but it could absolutely expand to fill in the gap so to speak.
robotic navigation and manipulation with environment would be my immediate guess. It would be able to build a complete 3D version of the world and recognize objects. Your idea could be a reality here as well.
CVPR 2022 was a very interesting year for 3D scene reconstruction. One particular paper I recall was reaching into a database of CAD objects and simply replacing the scene with those objects that fit very close to what is shown in the scene. It could mean that a robot armed with this type of computer vision could manipulate with every single object it sees and know exactly how to interact with it without further examination.