|
|
|
|
|
by bane
683 days ago
|
|
Visualizing large graphs is a natural desire for people with lots of connected data. But after a fairly small size, there's almost no utility in visualizing graphs. It's much more useful to compute various measures on the graph, and then query the graph using some combination of node/edge values and these computed values. You might subset out the nodes and edges of particular interest if you really want to see them -- or don't visualize at all and just inspect the graph nodes and edges very locally with some kind of tabular data viewer. It used to be thought that visualizing super large graphs would reveal some kind of macro-scale structural insight, but it turns out that the visual structure ends up becoming dominated by the graph layout algorithm and the need to squash often inherently high-dimensional structures into 2 or 3 dimensions. You end up basically seeing patterns in the artifacts of the algorithm instead of any real structure. There's a similar, but unrelated desire to overlay sequenced transaction data (like transportation logs) on a geographical map as a kind of visualization, which also almost never reveals any interesting insights. The better technique is almost always a different abstraction like a sequence diagram with the lanes being aggregated locations. There's a bunch of these kinds of pitfalls in visualization that people who work in the space inevitably end up grinding against for a while before realizing it's pointless or there's a better abstraction. (source: I used to run an infoviz startup for a few years that dealt with this exact topic) |
|
I want to stress this point and go a bit further. It can be worse as people have pareidolia[0], a tendency to see order in disorder. Like how you see familiar shapes in the clouds. There is a danger in that with large visualizations such as these that instead of conveying useful information, you counterproductively convince someone that something that isn't true is! Here's a relevant 3B1B video where this is kinda discussed. There is real meaning but the point is that it is also easy to be convinced of things that aren't true[1]. In fact, Grant's teaching style is so good you might even convince yourself of the thing he is disproving as he is revealing how we are tricked by the visualization. Remember what the original SO person latched onto.
I think it's important to recognize that visualization is a nontrivial exercise. Grant makes an important point at the end, restating how the visualization was an artifact and how if you dig deep enough into an arbitrary question, you _can_ find value. Because at the end of the day, there are rules to these things. The same is true about graphs. There will always be value in the graph, but the point of graphing is to highlight the concepts that we want to convey. In a way, many people interpret what graphs are doing and why we use them backwards. You don't create visualizations to then draw value from them, but rather your plots are a mathematical analysis that is in a more natural language for humans. This is subtle and might be confusing because people often are able to intuit what kind of graph should be used to convey data but are not thinking about what the process is doing. So what I'm saying is that you don't want to use arbitrary graphs, but there's the right graph for the job. You can find a lot of blogs on graph sins[2] and this point will become clearer.
At the heart, this is not so different than "lies, damned lies, and statistics." People often lie with data without stating anything that is untrue. With graphs, you can lie without stating a word, despite being worth a thousand. So the most important part of being a data scientist is not lying to yourself (which sounds harder than it is).
[0] https://en.wikipedia.org/wiki/Pareidolia
[1] https://www.youtube.com/watch?v=EK32jo7i5LQ
[2] Except this might be hard because if you Google this you'll have a hard time convincing google that you don't mean "sine". So instead search "graph deadly sins", "data visualization sins", "data is ugly", and so on. I'll pass you specifically the blog of "Dr. Moron" (Kennith Moreland) and one discussion of bad plots https://www.drmoron.org/posts/better-plots/ (Ken is a data visualization expert and both his blogs have a lot on vis). There's also vislies: https://www.vislies.org/2021/
(Source: started my PhD in viz and still have close friends in infoviz and sciviz who I get to hear their rants about their research, and occasionally I contribute)