|
|
|
|
|
by dnquark
5126 days ago
|
|
What finally made the curse of dimensionality "click" for me was recognizing that the volume of an n-sphere goes to zero as n grows large. In 2D, if I want to see which of my data points lie close to some reference point, I can draw a circle around my reference point and count the proportion of the data points inside. In 3D, I could do the same with a sphere. But in 20D, the volume of my hypersphere will be miniscule, and it will probably not even contain any points, so this procedure becomes useless for clustering. Note that I can't just pick the hypersphere of a larger radius: suppose all my points are contained inside a unit n-cube. The volume of the unit n-sphere, the largest one I could inscribe inside the n-cube will _still_ be approximately zero! This result is pretty cool and somewhat surprising, and there's a neat and short derivation of it; in fact (shameless plug) I have a youtube video where I derive it in 3 minutes: http://www.youtube.com/watch?v=QkMn_5QpsX8 -- feel free to check it out! |
|