Hacker News new | ask | show | jobs
by jfim 4302 days ago
Simultaneous localization and mapping (SLAM) essentially refers to various algorithms to determine egomotion (how much a robot moves in an environment) using sensors, while building a map at the same time.

Essentially, at each time step, the algorithm senses its environment and checks how much it differs from the previous time step, and figures out if it saw any new features to add to the map and how much the correlated features between the two time steps moved, to infer egomotion. This doesn't have to be necessarily with cameras, it can also be done with laser rangefinders and other relatively accurate sensors.

Monocular SLAM (MonoSLAM, also the name of a well known paper) is SLAM done with a single camera, which makes the problem harder than with two cameras. With two cameras affixed to a rigid frame and known characteristics, it's possible to determine the 3D position of any given feature that is seen by both cameras at the same time. With a single camera, however, it's trickier because only the angle of a given feature can be determined, not its 3D position, so an optimization step has to be done to determine what the likeliest solution to the problem is.

There's also more to read on the relevant Wikipedia article, at http://en.wikipedia.org/wiki/Simultaneous_localization_and_m...

1 comments

OK so my understanding is pretty much the gist of that. See how you've moved by comparing features extracted from a series of images taken over time. I just don't understand the maths :)

The reason we went with a single camera is lack of space. As you can see from some of the imagery of the product, the camera stack is a huge proportion of the machine. Also when the algorithms were being developed in the early 2000's cameras were still expensive bits of kit. I seem to remember the first one being 1024x1024 resolution, pretty poor for photography, but good enough for feature mapping with SLAM.