They have gone round and taken a video of every street in a certain area, unpacked it, extracted salient points, reconstruct those points to get a 3d map.
From that, given any 2d image you should be able to extract a bunch of "salient points" or known points, which from their relationship to each other can tell where the camera is, and what direction its pointing.
The two hard parts are 1) collecting the data 2) searching the data in reasonable amount of time
the real thing that i'm puzzled by with blue vision is how they're registering against ARKit descriptors (if they are at all) since apple doesn't expose them in the ARKit api (only the point cloud itself). ARCore used to expose them (https://stackoverflow.com/a/29012790) but i don't think it does anymore. they must be doing the registration because they only support devices that are running ARKit/ARCore (and without it they would just have built a SLAM system - albeit backed by an "arcloud" - that sits beside ARKit/ARCore and would most likely be inferior).
> the real thing that i'm puzzled by with blue vision is how they're registering against ARKit descriptors (if they are at all) since apple doesn't expose them in the ARKit api (only the point cloud itself). ARCore used to expose them (https://stackoverflow.com/a/29012790) but i don't think it does anymore. they must be doing the registration because they only support devices that are running ARKit/ARCore (and without it they would just have built a SLAM system - albeit backed by an "arcloud" - that sits beside ARKit/ARCore and would most likely be inferior).
I have had a look at their API documentation, and what they do is they provide you with an anchor, and that's where you attach your SCNode-s. They use the built-in ORB-SLAM to position your SCNodes, but these are all relative to the main anchor, hence achieving stability and persistence.
They have gone round and taken a video of every street in a certain area, unpacked it, extracted salient points, reconstruct those points to get a 3d map.
From that, given any 2d image you should be able to extract a bunch of "salient points" or known points, which from their relationship to each other can tell where the camera is, and what direction its pointing.
The two hard parts are 1) collecting the data 2) searching the data in reasonable amount of time