Hacker News new | ask | show | jobs
by isani 4214 days ago
A maximum spanning tree might be misleading, as it's easy to interpret no vertex as no correlation. When building a tree, weak correlations may be included out of necessity, while stronger ones that lead to cycles are omitted.

If several dimensions are correlated just about equally strongly, you can get very different trees based on small random variation. There's no guarantee that all significant correlations are displayed, or that correlated dimensions are visually close to one another.

1 comments

I agree, it's not perfect - just a useful abstraction. Just the same as arbitrary thresholds for correlation or a p<0.05 significance level - often you lose information but gain insight. From personal experience I've seen MST's map out underlying structures that validate classical chemical kinetics of a system in a logical path: something that would not have been apparent in ordinary thresh-holding approaches

Basically IMO it's good to use all of these techniques together to get a good picture of your system. In the end the greatest limitation is our human cognition to interpret the results, which frankly needs all the help it can get.

Thank you for the feedback. I prefer to use a graph instead of a tree because I want to spot clusters of relations.