Hacker News new | ask | show | jobs
by mooneater 1298 days ago
20+ year fan of graphviz here, love seeing what you are doing with D2!

Any examples of larger diagrams (1000+s of nodes) and how does that perform in layout?

2 comments

thank you! There isn't right now, we'd need to add a simpler layout algorithm for those, e.g. force-directed, radial, etc. That use case is not something D2 is trying to tackle though, we're just focused on software architecture diagrams right now, and I've yet to see one span that many nodes
I have made software architecture diagrams, but more "descriptive" than "prescriptive" ones, by generating graphviz from source code analysis tooling. (The objective here was getting oriented in a large codebase where there wasn't really a rigorous diagram of anything written down anywhere, and trying to find sort of a min-cut where putting a line through the graph of imports would cross the fewest edges)

In my experience graphviz handles up to thousands of nodes and tens of thousands of edges without really breaking a sweat.

edit: I've done the same thing with extracting AWS diagrams from aws describe* calls, with good results.

Picking a set of domain specific targets and honing a tool to service that seems a good way to start building - rather than a general solver.

I also love graphviz but had a project to create an app for legible quality management, and it was a headache to shoehorn some concepts into "nodes in an edge connected graph" way of thinking (ended up programatically scripting LaTeX).

With so many visualisation concepts there's room for niche approaches (same in plotting; Matlab/Octave vs Gnuplot vs Pyplot vs R and so on)

agreed! that reminds me, there's a company that specializes in this niche of visualizing large data: https://www.graphistry.com/ (haven't used it myself, looks well-made).
100% agreement, spot on observations! Indeed, we end up referring folks to cool diagramming tools all the time at Graphistry as d2/mermaid/diagrams.net are well-optimized for quick & beautiful diagram presentation tasks on manageably sized & fairly static datasets. Using Graphistry to do a quick markdown of how a 4 node cluster might work is a bit like driving a tank to pick up some milk :) Teams go to us more for gnarly investigation & splunking tasks that need a visual power tool, like looking at alert logs or big systems, so we optimize for making that scale interactive & easy.

The overlap is real however, so a lot of room for teams to learn. Ex: We're the easiest graph tool for jupyter/databricks/streamlit etc teams who use dataframes, and I can imagine those tools learning from us here. In the reverse direction, our work more in terms of quickly configurable global data<>viz data bindings ("using the UI or API, bind each event's score to a hot-and-cold coloring palette and use a warning icon on all type=alert events"), but we have a ways to go to support the more manual artisinal effects of diagramming tools like Figma, where each element might have a super fancy & unique border style.

IDK if it's quite the use case you're looking for, but at my company, we turn our CI YAML for our monorepo into Dot (Graphviz), and graph it, to visualize the dependencies between steps in our CI, & to highlight the critical path.

It's an ~100 node DAG. Graphviz struggles a bit with it, particularly with edge layout.

(Though really, the layout I want for it is a Gnatt chart style layout.)

I am always quite astonished how bad the default layouters for graphs perform. When I was still doing compiler optimization in the beginning of the 2000s we did not struggle with quite big graphs thanks to cool graph visualizers such as vcg [1] . Two weeks ago I was tempted to try it again after nearly 20 years because I was frustrated even visualising a relatively small graph in python (cytoscape seemed to be the only working software in the end but it was quite a pain to get it to just render what I wanted)

[1] https://www.rw.cdl.uni-saarland.de/people/sander/private/htm...

Laying out larger graphs is tricky, often because the size simply stands in the way of generating anything that's useful to the viewer. To add to that, most layout algorithms prioritize optimizing certain criteria, while other parts of the visualization emerge from that; and if the human viewer chooses a layout algorithm because of one of the latter properties, they are often surprised that the result doesn't look like they envisioned (because, hey, the algorithm optimized something entirely different). We see this disconnect fairly often in our own customer support, but I haven't really found a good way of putting an explanation in writing.

Then there's generally the problem of larger graphs, which tend to devolve into a tangled mess and hairballs, simply because they often tend to be well-connected. If there's no way of pruning them beforehand, or perhaps grouping, aggregating or clustering (in a way that makes sense to the viewer, not necessarily only structurally), then it can be hard to get good result.