| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bane 683 days ago

Visualizing large graphs is a natural desire for people with lots of connected data. But after a fairly small size, there's almost no utility in visualizing graphs. It's much more useful to compute various measures on the graph, and then query the graph using some combination of node/edge values and these computed values. You might subset out the nodes and edges of particular interest if you really want to see them -- or don't visualize at all and just inspect the graph nodes and edges very locally with some kind of tabular data viewer.

It used to be thought that visualizing super large graphs would reveal some kind of macro-scale structural insight, but it turns out that the visual structure ends up becoming dominated by the graph layout algorithm and the need to squash often inherently high-dimensional structures into 2 or 3 dimensions. You end up basically seeing patterns in the artifacts of the algorithm instead of any real structure.

There's a similar, but unrelated desire to overlay sequenced transaction data (like transportation logs) on a geographical map as a kind of visualization, which also almost never reveals any interesting insights. The better technique is almost always a different abstraction like a sequence diagram with the lanes being aggregated locations.

There's a bunch of these kinds of pitfalls in visualization that people who work in the space inevitably end up grinding against for a while before realizing it's pointless or there's a better abstraction.

(source: I used to run an infoviz startup for a few years that dealt with this exact topic)

6 comments

godelski 682 days ago

> But after a fairly small size, there's almost no utility in visualizing graphs.

I want to stress this point and go a bit further. It can be worse as people have pareidolia[0], a tendency to see order in disorder. Like how you see familiar shapes in the clouds. There is a danger in that with large visualizations such as these that instead of conveying useful information, you counterproductively convince someone that something that isn't true is! Here's a relevant 3B1B video where this is kinda discussed. There is real meaning but the point is that it is also easy to be convinced of things that aren't true[1]. In fact, Grant's teaching style is so good you might even convince yourself of the thing he is disproving as he is revealing how we are tricked by the visualization. Remember what the original SO person latched onto.

I think it's important to recognize that visualization is a nontrivial exercise. Grant makes an important point at the end, restating how the visualization was an artifact and how if you dig deep enough into an arbitrary question, you _can_ find value. Because at the end of the day, there are rules to these things. The same is true about graphs. There will always be value in the graph, but the point of graphing is to highlight the concepts that we want to convey. In a way, many people interpret what graphs are doing and why we use them backwards. You don't create visualizations to then draw value from them, but rather your plots are a mathematical analysis that is in a more natural language for humans. This is subtle and might be confusing because people often are able to intuit what kind of graph should be used to convey data but are not thinking about what the process is doing. So what I'm saying is that you don't want to use arbitrary graphs, but there's the right graph for the job. You can find a lot of blogs on graph sins[2] and this point will become clearer.

At the heart, this is not so different than "lies, damned lies, and statistics." People often lie with data without stating anything that is untrue. With graphs, you can lie without stating a word, despite being worth a thousand. So the most important part of being a data scientist is not lying to yourself (which sounds harder than it is).

[0] https://en.wikipedia.org/wiki/Pareidolia

[1] https://www.youtube.com/watch?v=EK32jo7i5LQ

[2] Except this might be hard because if you Google this you'll have a hard time convincing google that you don't mean "sine". So instead search "graph deadly sins", "data visualization sins", "data is ugly", and so on. I'll pass you specifically the blog of "Dr. Moron" (Kennith Moreland) and one discussion of bad plots https://www.drmoron.org/posts/better-plots/ (Ken is a data visualization expert and both his blogs have a lot on vis). There's also vislies: https://www.vislies.org/2021/

(Source: started my PhD in viz and still have close friends in infoviz and sciviz who I get to hear their rants about their research, and occasionally I contribute)

link

throwaway425933 682 days ago

My use case is that I have a graph of flops, latches, buffers, AND, OR, NOT gates and I want to visualize how data is changing/getting corrupted as it goes through each of them.

link

bane 682 days ago

It's likely that a better way to do this is not to "eat the elephant", but do it at some medium-scale or subcomponent level.

It sounds like perhaps what you are trying to do is something more like this?

http://visual6502.org/

check out the visual simulations of the

http://visual6502.org/sim/varm/armgl.html

http://visual6502.org/JSSim/index.html

http://visual6502.org/JSSim/expert-6800.html

I will say (and please forgive that digital circuits are not my field), there are almost certainly better techniques and approaches in the field to accomplish what you are trying to do. I would personally move away from what you are trying and seek insight in the domain that's able to produce multi-billion transistor microprocessors.

Perhaps there are tools for large-scale logic circuit simulation?

https://old.reddit.com/r/computerscience/comments/uhappo/bes...

link

bjourne 682 days ago

I did that a few years go. It was a nice visualization up until restoring division. With more components than that the layout just becomes to cluttered to be meaningful. And it is very difficult to encode all heuristics one uses when drawing "pretty" circuit diagrams by hand into an algorithm.

link

tobbe2064 682 days ago

Can you recommend any good literature on the subject?

link

bane 682 days ago

Honestly, I've been away from the field for quite a long time so wouldn't be up to date. But, if you want kind of a good framing of the field, how it evolved and how it's different from other kinds of visualization (like scientific) maybe start here [0a][0b]

0 - https://www.cs.purdue.edu/homes/xmt/classes/slides/CS530/Inf...

- https://en.wikipedia.org/wiki/Data_and_information_visualiza...

There used to be a lively research field for information visualization that studied current visualization techniques and proposed new ones to solve specific challenges -- I remember when treemaps were first introduced for example [1]. Large networks were a pretty big area of research at the time with all kinds of centrality clustering, and edge minimization techniques.

1 - https://www.google.com/search?q=treemap+visualization&tbs=im...

A few teams even tried various kind of hyperbolic representations [2,3] so that areas under local inspection were magnified under your cursor, and the rest of the hairball was pushed off to the edges of the display. But with big graphs you run into quite a few big problems very quickly like local vs. global visibility, layout challenges, etc.

2 - https://graphics.stanford.edu/papers/webviz/webviz/node2.htm...

3 - https://www.caida.org/catalog/software/walrus/

Not specifically graph related, but the best critical thinker I know of in the space is probably Edward Tufte [4]. I have some problems with a few bits of his thinking, and other than sparklines his contributions are mostly in terms of critically challenging what should be represented, why, how, and methods of interaction, his critical analysis has stayed up there as some of the best. He has a book set that's a really great collection of his thoughts.

4 - https://www.edwardtufte.com/tufte/

If you approach this problem critically, you end up at the inevitable conclusion that trying to globally visualize a massive graph in general is basically useless. Sure there are specific topologies that can be abstracted into easier to display graphs, but the general case is not conducive. It's also somewhat surprising at how small a graph can be before visualizing it gets out of hand -- maybe a few dozen nodes and edges.

I remember the U.S. DoE did some really pioneering studies in the field and produced some underappreciated experts like Thomas, Cook and Risch [5,6]. I like Risch's concepts around visualizations as formal metaphors of data. I think he's successful in defining the rigorous atomic components of visualization that you can build up from. Considering OP's request in view of Tufte and Risch, I think that they really need to think about the potential for different metaphors at different levels of detail (since they specify zooming in and out). There may not exist a single metaphor that can visualize certain data at every conceivable scope and detail!

5 - https://ils.unc.edu/courses/2017_fall/inls641_001/books/RD_A...

6 - https://arxiv.org/pdf/0809.0884v1

One interesting artifact from all of this is that most of the research has long ago been captured and commoditized or made open source. There really isn't a market anymore for commercial visualization companies, or grant money for visualization research. D3.js [7] (and the derivatives) more or less took millions upon millions of dollars in R&D and commercial research and boiled it down into a free, open source, library that captured pretty much all of the major findings in one place. It's objectively better than anything that was on the market or in labs at the time I was in the space and it's free.

7 - https://d3js.org/

link

InGoldAndGreen 678 days ago

The one really helpful use for a massive nodegraph with way too much data? Convincing people that something is complicated. Eg: illustrating to non-technical people that your codebase is a massive mess.

link

elijahwright 682 days ago

Sometimes you can get farther with something like a summary statistics table of the different motifs that show up in a dataset.

Hairballs are not interesting, but the shapes that show up in a graph once you make a few cuts can be fascinating.

link

PaulHoule 682 days ago

I am pretty sour about it and will call out people who post "just another hairball" and act like they've done something special.

I think there is a need for a tool that can extract and tell an interesting story based on a subgraph of a huge graph, but that takes thinking unlike hairball plotting, ai image generation and other seductive scourges.

I went to an posthumous art show based on this guy

https://www.amazon.com/Interlock-Conspiracy-Shadow-Worlds-Lo...

where they showed how he drew 40 drafts with pencil of one of his graphs and went from a senseless hairball to something that seems immediately meaningful. Funny that might have something to do with his mysterous death... Maybe a tool that would help you do that is too dangerous for "them" to let you have!

link

godelski 682 days ago

I think you answer you unknowingly answered your own question. The reason no such tool exists is that this stuff is very hard. Worse, it is something that sounds and looks easy. Terrible graphs and misleading ones are not the result of maliciousness and cunning deception. Rather it is the opposite. Bad graphs happen because it is easy to visualize data, but hard to create good and meaningful visualizations[0]. It is because most people mindlessly apply a set of procedures to select the correct graph, not knowing the reasoning behind those procedures. It is in part due to the large quantity of people that have learned this and normalize/perpetuate the myth that visualization is easy. Because they do not distinguish the action from the end result. Just in the same way you might be able to perform all the manual tasks to assemble a house (use a screwdriver, hammer and nail, saw, fit pipes together, etc), it would be naive to assume that you could assemble a house. The reason there's so many terrible graphs is because it is easy to build a shanty and you rarely see an actual house to tell you what you're missing.

I doubt we'd see such a tool anytime soon. It takes expert experience and skill to make good visualizations and there are no well defined rules. If you see such a tool, I'd be wary of promises that are too big to be kept.

[0] Sometimes people complain about how something has a difficult/steep learning curve. It is important to note that while frustrating, this does not always make the learning curve a bad thing. Often a shallow learning curve can be bad because it convinces one that they have far greater ability than they actually do. We could argue that this is in part due to the improper way we visualize learning curves.

link

bawolff 682 days ago

> Funny that might have something to do with his mysterous death... Maybe a tool that would help you do that is too dangerous for "them" to let you have!

This may be one of the most rediculous conspiracy theories i have ever heard. Big data (heh) had him killed... ok.

link