Hacker News new | ask | show | jobs
by ssivark 2289 days ago
Huh, what? It needs almost a million views, and takes 1-2 days to train on a GPU. I’m not sure where the “5 minutes” number comes from.

EDIT: I was referring to the last paragraph of section 5.3 (Implementation details), but maybe I’m misunderstanding how they use rays / sampled coordinates.

Very impressive visual quality. But it seems like they need a LOT of data and computation for each scene. So, its still plausible that intelligently done photogrammetry will beat this approach in efficiency, but a bunch of important details need to be figured out to make that happen.

2 comments

Excuse me I meant 5MB. It takes 12 hours to train.

>All compared single scene methods take at least 12 hours to train per scene

But it seems to only need sparse images.

>Here, we visualize the set of 100 input views of the synthetic Drums scene randomly captured on a surrounding hemisphere, and we show two novel views rendered from our optimized NeRF representation

> It needs almost a million views

Not sure what you mean by "views". The comparisons in the paper use at most 100 input images per scene.

A pixel is one view for their model if I understand correctly, so one hundred 100x100 images would be a million views.