They're probably fusing both lenses with the Lidar and some other tricks to reliably compute a dense surface. That would explain their suggestion not to move the camera very much, as that would cause a large portion of the mesh to be rebuilt. A blogger exported what appears to be two side by side videos, so maybe the view really is narrow or reconstruction happens at playback. There might also be Lidar data in there that he didn't notice.
Apple bought C3 Technologies a decade ago, and they use this technique to fuse photos from low flying charters to produce the 3d view in Apple Maps.
Pure speculation: when combined with the LIDAR depth sensor, the two cameras probably don't need as much physical separation to accurately create a depth map. The bigger problem is the inpainting needed to generate hidden detail when the movie is viewed from angles that are different from the one it was actually filmed from.
My understanding is that very few consumer lidar sensors work well in daylight. It's hard to send out & detect significantly meaningful pulses of light, when there's sunlight all around.
I have an Intel L515 which is pretty remarkable in that sometimes you can get some depth finding outdoors. This is just a hobby item for me, I'm not an expert, but this launched as a fairly impressively long range & capable $350 USB3 system, and it seems like the market hasn't much comparable to it. Phones certainly I'd expect to be significantly worse.
>My understanding is that very few consumer lidar sensors work well in daylight. It's hard to send out & detect significantly meaningful pulses of light, when there's sunlight all around.
Aren't many "self driving car" sensors lidar? This would imply they can work in daylight - perhaps they don't necessarily depend on light on the sunlight spectrum?
(Or perhaps you don't consuder them consumer? Though those cars are consumer products, they're not made for military or industrial use)
Many cars are lidar,, but they use much stronger, bigger, and higher power lasers, on very expensive and precise rotator assemblies.
The L515 I mentioned was somewhat advanced at least for it's day because it used MEMS to steer its light source. That gave it leading class performance/size but it's still big and kinda hot-ish. Maybe we can keep scaling that kind of system performance to smaller sizes but even this package was pretty cutting edge & gave much better falloff than many competing systems, and was still largely an indoor sensor.
>The bigger problem is the inpainting needed to generate hidden detail when the movie is viewed from angles that are different from the one it was actually filmed from.
It's for spatial video, not for holographic video. When you see a 3d movie in a cinema, it's not like you can look at it from widely different anges and go peek from the side or behind the actors or whatever...
Given that iPhone cameras are ~2.5 mm apart, there needs to be some amount of in-painting when building the stereo image, so that it looks like it is taken with cameras that are ~6.5 mm apart.
I was wondering about the use of the lidar sensor. Notably they do not say they are using it, but maybe they just wanted to keep it simple? Idk seems weird not to use lidar but also seems weird not to mention it if they are using it.
But if you have eyes 50mm apart, and source material from cameras 15mm apart (plus other depth information), you'll need to in-paint a small amount where your eyes could see "around" something and the cameras can't.
Or you can make do with 15mm-apart worth of "around"?
It's still more "spatial"/3D than a regular (single lens) image.
Plus this has a wide lens and a "regular" lens (actually both wide iirc but one is ultrawide), so it's not like 2 equal lenses 50mm apart like in regular stereoscopic "3d" video.
> Or you can make do with 15mm-apart worth of "around"?
You need to move the close objects further apart in left/right than they are in the camera. Then you need to fill the newly empty areas with something.
> Plus this has a wide lens and a "regular" lens (actually both wide iirc but one is ultrawide), so it's not like 2 equal lenses 50mm apart like in regular stereoscopic "3d" video.
> The bigger problem is the inpainting needed to generate hidden detail when the movie is viewed from angles that are different from the one it was actually filmed from.
Early reviews indicate it is, as some reviewers have had access to spatial video taken from a phone, but I’m not sure if those were ideal conditions or just ad-hoc.
Two focal lengths at the same physical distance to the subject have exactly the same perspective (i.e. if you crop them to the same area they will look the same). There is no extra information to be had from that.
The depth information that can be obtained from differences in angular position/size of objects within cameras' FOV. There's a reason a photo taken with a 28mm doesn't look the same as with a 50 a few steps back.
Exactly. The steps back change perspective, not the lenses. That’s what I was trying to say above. In the iPhone both lenses are at the same distance to the subject.
One of the two cameras is the ultra-wide camera so it gets some additional parallax and visual information than just the separation to the other camera.
That’s not how parallax works. The wider field of view of the ultra wide camera will show some of the scene that the other camera doesn’t see, but over overlapping parts of the scene the parallax is a strict function of the location of the two lenses’ entrance windows.
Apple bought C3 Technologies a decade ago, and they use this technique to fuse photos from low flying charters to produce the 3d view in Apple Maps.
[ Paper: https://ui.adsabs.harvard.edu/abs/2008SPIE.6946E..0DI/abstra... ]
[ Coverage: https://9to5mac.com/2011/10/29/apple-acquired-mind-blowing-3... ]
[ Similar: https://web.stanford.edu/class/ee367/Winter2021/projects/rep... ]