Hacker News new | ask | show | jobs
by aaroninsf 480 days ago
Can someone ELI5 what the input to these renders is?

I'm familiar with the premise of NeRF "grab a bunch of relatively low resolution images by walking in a circle around a subject/moving through a space", and then rendering novel view points,

but on the landing page here the videos are very impressive (though the volumetric fog in the classical building is entertaining as a corner case!),

but I have no idea what the input is.

I assume if you work in this domain it's understood,

"oh these are all standard comparitive output, source from <thing>, which if you must know are a series of N still images taken... " or "...excerpted image from consumer camera video while moving through the space" and N is understood to be 1, or more likely, 10, or 100...

...but what I want to know is,

are these video- or still-image input;

and how much/many?

2 comments

They are photos, in this case from the MIP Nerf 360 dataset. I believe there are on the order of hundreds per scene. They are not videos turned into photos. Some datasets include high grade position and directional information -- I believe this dataset does not, so you need to do some work to orient the rendering training. But, I'm a hobbyist, so all this could be very wrong.
> We optimize adaptive sparse voxels radiance field from multi-view images…

Pretty sure the input is the same as for NeRFS, GS and photogrammetry: as many high rez photos from as many angles as you have the patience to collect.

I think the example scenes are from a common collection of photos that are being widely used as a common reference point.