Hacker News new | ask | show | jobs
by cechner 2549 days ago
yep - look up 'Hidden Markov Map Matching through Noise and Sparseness' by Krumm and Newson

it calculates the final probability by combining 'emission' probabilities (the probability that a GPS observation was on a particular road) by the 'transition' probability that if an observation was on a particular road at one point, what is the probability that it is now on this other road segment. By combining these two the final probability incorporates both the nearness of the GPS signals to the roads and the connectivity of the road network itself.

I've found the formulas applied in this paper are good in practice only if the GPS updates are relatively frequent

1 comments

I've also found that it can be tricky to map "convoluted" routes that don't necessary go simply from point A to pointB.

If you don't mind me asking, roughly what frequency threshold have you found the algorithm to perform badly above, and are you aware of any algorithms or formulae which perform better in these situations?

it should be in seconds - the problem is that the paper assumes that the 'great circle' (straight line) distance between two points should be almost the same as the 'route' distance between those points, with an exponential probability distribution.

This means that if the path between two points is not simple (around a corner) the probability drops off very quickly. If the time between measurements is in minutes, this heuristic is pretty useless (and you should really use log-scale for your numbers!)

edit: this is actually shown in figure 8 of the paper where they explore different 'sampling periods'

edit 2: I have not explored other methods yet, but it would probably make sense to start by deriving the formula the way they do, by exploring ground-truth data.

edit 3: I just noticed that my comments are largely repeating what you're saying - sorry!

Ah, that rings a bell now. You can vary a parameter they call "beta" to allow for more convoluted routes, and I think a larger value gives a little leeway for less frequent fixes.

Agreed, the log scale is really important to avoid arithmetic underflow =] I believe OSRM and Graphhopper both do it that way. In my implementation I've flipped from thinking of measurement/transition "probabilities" to "disparities", and I choose the final route that has the least disparity from the trail. It seems to handle trails with around a 30-60s frequency over a 5-10hr period with decent accuracy.

actually, beta is less useful than that! I think it represents the median difference between the two distances, its not a tolerance (at least as far as I can recall after experimenting with tuning this value).

As with you, I have found that it still often gives ok results with slower frequencies, as long as the transition probabilities are still relatively in the same scale as each other for a particular observation pair. However it means that there's no point trying to 'tune' using the gamma and beta parameters