Hacker News new | ask | show | jobs
by nerdponx 3034 days ago
The behavior the article is, to me, way more bizarre than the curse of dimensionality.

It's tempting to think of data sets as "point clouds". This article is a reality check for me: you can't safely apply intuition about 2- and 3-d point clouds to higher dimensional data. I suspect that this explains why methods like tSNE seem to produce unstable results depending on the parameters [0]. The notion of a "neighbor" in high dimensions is just not what I think it is.

I suppose the same is true for high-dimensional cost surfaces. Gradient descent is often described as "like walking down a hill". But without a deep understanding of high-dimensional geometry, I'm not at all confident that I know what a 4-, 10-, or 1000-dimensional hill looks like.

The lesson: Be skeptical of my own geometric intuition unless it is firmly backed by math.

[0]: https://distill.pub/2016/misread-tsne/

1 comments

Indeed. I find https://www.thestar.com/news/insight/2016/01/16/when-us-air-... to be a good cautionary tale on how our intuition about people being close to average is misleading - nobody is. And nobody is particularly like anyone else, either.

On a thousand dimensional hill, my intuition is that it locally looks like a low dimensional hill, along axes that you can find through techniques like Principal Components Analysis. This has yet to mislead me. On the other hand, my pure math background was a long time ago, and I have not explored machine learning in any real depth...

But what this article makes me think is that even "low" dimensions can't be trusted, as long as it's greater than 3.