Techniques like NeRF allow you to take a bunch of photos of a real 3D scene and then generate images/video of the scene from arbitrary viewpoints, where NeRF will infer the 3D structure using machine learning. So what you're seeing is the camera smoothly flying around rooms where the video was generated (in near-real-time, I think) by an "AI" that was trained on pictures of the rooms.