The only artifact I can see is that lines, e.g. moldings become wavy. Not very jittery, but consistently wavy. Adding some line detection to force-straighten? Perhaps the same would be true for ellipses for images with large circles. These interior videos had more straight lines, though.
I think a bigger problem is how much computational resources NERF models need. From the paper,
"Our model, Mip-NeRF 360, and our 'mip-NeRF 360 + iNGP' baseline were all trained on 8 NVIDIA Tesla V100-SXM2-16GB GPUs."
Even with all that, it takes almost an hour to train. A developer would need to do this for every scene, which may make it unfeasible for indies, those who would benefit the most from NERF in gaming.
The better question is how many person hours would it take to model a scene like that with alternative approaches, and can a NeRF be used to create outputs that could be used in a game engine?
To clarify for people who don't follow NeRF techniques, this research is not prompt based. The algorithm is capturing the 3d scene from real life images. There is some super promising work in mixing NeRF based techniques with various generative models to create 3d objects from prompts but it doesn't seem close to creating anything of this kind of scale / detail yet. I do agree this is a future possibility though.
Techniques like NeRF allow you to take a bunch of photos of a real 3D scene and then generate images/video of the scene from arbitrary viewpoints, where NeRF will infer the 3D structure using machine learning. So what you're seeing is the camera smoothly flying around rooms where the video was generated (in near-real-time, I think) by an "AI" that was trained on pictures of the rooms.