|
|
|
|
|
by KaiserPro
3020 days ago
|
|
pretty much that. They have gone round and taken a video of every street in a certain area, unpacked it, extracted salient points, reconstruct those points to get a 3d map. From that, given any 2d image you should be able to extract a bunch of "salient points" or known points, which from their relationship to each other can tell where the camera is, and what direction its pointing. The two hard parts are 1) collecting the data 2) searching the data in reasonable amount of time |
|
https://youtu.be/tXwVg2S9wuY?t=60
the "salient points" are called keypoints and their feature vectors are called descriptors
https://en.wikipedia.org/wiki/Scale-invariant_feature_transf...
you are correct that the challenge is collecting and indexing/retrieving but there have been techniques that do this for a while
https://web.eecs.umich.edu/~michjc/papers/p144-park.pdf
(they even tested against SIFT descriptors)
the real thing that i'm puzzled by with blue vision is how they're registering against ARKit descriptors (if they are at all) since apple doesn't expose them in the ARKit api (only the point cloud itself). ARCore used to expose them (https://stackoverflow.com/a/29012790) but i don't think it does anymore. they must be doing the registration because they only support devices that are running ARKit/ARCore (and without it they would just have built a SLAM system - albeit backed by an "arcloud" - that sits beside ARKit/ARCore and would most likely be inferior).