Hacker News new | ask | show | jobs
by dnquark 5126 days ago
What finally made the curse of dimensionality "click" for me was recognizing that the volume of an n-sphere goes to zero as n grows large. In 2D, if I want to see which of my data points lie close to some reference point, I can draw a circle around my reference point and count the proportion of the data points inside. In 3D, I could do the same with a sphere. But in 20D, the volume of my hypersphere will be miniscule, and it will probably not even contain any points, so this procedure becomes useless for clustering. Note that I can't just pick the hypersphere of a larger radius: suppose all my points are contained inside a unit n-cube. The volume of the unit n-sphere, the largest one I could inscribe inside the n-cube will _still_ be approximately zero!

This result is pretty cool and somewhat surprising, and there's a neat and short derivation of it; in fact (shameless plug) I have a youtube video where I derive it in 3 minutes: http://www.youtube.com/watch?v=QkMn_5QpsX8 -- feel free to check it out!

2 comments

Very nice video. For those less able to follow the calculus, another way to get at least a little intuition about this result is to think about how far the farthest points in a unit cube are from the origin - in 20 dimensions, the point (1,1,...1,1) is sqrt(20) ~= 4.47 units from the origin, so the 19-sphere only "reaches" 1/sqrt(20) ~= 0.22 of the way to the corners of the 20-cube.
Thank you, Sir! You are a scholar and a gentleman.