| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by n4r9 2718 days ago

This is cool. I'm working on something similar to divide up areas for municipal waste collection based on travel times between properties.

> for this problem, we are defining the distance matrix directly from the source data. This results in the unusual property that the feature space distance from A to B is likely to be different from the feature space distance from B to A, as more commuters will commute in one direction than the other. For this project, we made the decision to examine origin to destination commute flows only, as this resulted in the clusters that were clearly defined in both feature space and real space, while the inverse resulted in clusters that were significantly overlapping in real space.

I had a similar problem, since the travel time from A to B can differ dramatically to that from B to A. I experimented with a few different ways of symmetrising the matrix and found that taking the maximum of both values was a pretty good compromise.

I also found that hierarchical clustering didn't work as well as K-means/medoids when the "clusters" were not necessarily very well-defined and the number of data-points was in the thousands.

1 comments

steve_gh 2718 days ago

@n4r9. Where in the world are you working? I would be interested in getting in touch. You can reach me at stephen dot gooberman hyphen hill at amey dot co dot uk

link