Hacker News new | ask | show | jobs
by ViktorV 2638 days ago
I don't think it uses GPS for that. You can match two partially overlapping images. From the movement of the feature points in the overlapping areas you can calculate your camera orientation, and thus the 3d point cloud of feature points ( or something close to this ). I think it is called structure from movement, the readme links a paper, maybe it's worth reading for you.
2 comments

Interesting, is there a technical reason for not using that extra data to help with the process?

For example, another commenter mentioned

> Some materials don't really contain any surface details that the algorithms could use to attach feature points to, so they will be blank. Large white walls and large windows are especially difficult.

Seems like you might be able to position some of these with orientation data from the camera.

The GPS data is probably being used as a prior. The GPS metadata is accurate but imprecise, while SFM more precise. GPS is likely used for pose initialization, while SFM is then used to refine the pose.
Would a high-quality pre-registration via precise coordinates and orientation yield better (and faster) results though? If the algorithm did not have to guess the camera parameters I would imagine a benefit.
In practice, there are a couple of ways to handle this. In one case, EXIF data gives you a lot of information (sensor size, camera/drone make, etc.) that you can extract very good camera intrinsics from. This information can also be fed into big EXIF databases, from which you can obtain pretty accurate camera parameters.

Additionally, camera intrinsic calibration is a relatively solved problem, especially if you know that every single camera has the same intrinsics.