| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by doctorleff 168 days ago

All data structures can be represented as graphs. I use the term "graph" for a collection of nodes (dots) and edges. (The rest of this paragraph introduces this concept of graph, as per this definition, for those not familiar with it.) Imagine a set of islands connected by bridges; the nodes are the islands; the bridges are the edges. [1] Seven Bridges of Konigsburg, Wikipedia, https://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsbe... Graph Theory, https://discrete.openmathbooks.org/dmoi3/ch_graphtheory.html A different kind of example is a conflict graph; the program reads a set of courses, a set of students, and for each student, the courses that student wants to take; the nodes would be courses; every time two or more students want to take the same two courses, the program creates an edge between those two courses. [2] [2] Runa Ganguli and Siddhartha Roy, "A Study on Course Timetabling based on the Graph Coloring Approach" International Journal of Computational and Applied Mathematics, Volume 2, No 17, 469-485.

A computer program would process this graph to schedule the courses so there are no conflicts or few conflicts. In other words, it would try to satisfy as many students as possible. This is in contrast with the term "graph" that one saw in high school or junior high school; that represents a function. An example would be the line chart where the height of a child is on the y-axis with their age on their x-axis and a point representing each time their height was taken and with lines connecting one height data point to another.

All data structures can be represented as graphs. For example, a hypergraph can be represented as a graph where each hyperedge corresponds to a node and connected to the nodes to which the hyperedge. Objects in a mechanical engineering CAD system or graphics display system are often kept as the winged-edge configuration. That is, we know for each edge, the adjacent faces and for each faces, the edges. Thus, the face is a "hyperedge" with the edges in the diagram being the nodes. [3] [3] https://en.wikipedia.org/wiki/Winged_edge Stanford Technical Report STAN CS 320 Bruce G. Baumgart, "Winged Edge Polyhedron Representation" http://i.stanford.edu/pub/cstr/reports/cs/tr/72/320/CS-TR-72... Charles Eastman and Kevin Walter Geometric Modeling Using the Euler Operators , Carnegie Mellon University DRC 15-279, May 1979

Of course, any graph can be serialized. Often, that would be done in JSON or XML. ChatGPT tells me that the time to serialize a graph is O(V+E) for adjacency lists and O(V^2) for adjacency matrices. That is, any data structure represented as a collection of pointers can be converted into text in time linear to the amount of information in the data structure. Adjacency matrices are used when we want to quickly see whether one entity is connected to another; but it is at the cost of space and time to serialize.

Assume one is tracking which students are taking (or are interested in taking) which course. In the computer, the programmer can put this into a rectangular array of size CS where C is the number of courses and S is the number of students. When dumped into text naively into text, this would take space and time writing to disk proportional to CS. On the other hand, assume that this is a sparse array; on average, each student is only interested in taking 10 courses. We can represent this as a list of average size 10 for each student, or time 10S. (Or more precisely, 10S+C.) We call in computer science, the relation between students and courses as a many-many relationship.

That is the power of sparsity, it reduces the time from a product to a linear function. (The classic graph is a many-many relation of something to itself. That is, which island is connected to another island by a bridge, which course is connected to another one by a student interested in both, or which city is connected to another city by a direct flight.) The average number of connections for each entity to another entity is the sparsity, m. Thus, the time to write the data for a sparse representation is represented by mN where N is the number of entities (or nodes).

By a little verbal sleight of hand, we say that the many-many relationship of students courses is a graph where some nodes are labled "student" and others are represented "course."

Throughout the above discussions, I ignore the constant which is the time to write one connection to the file; in this discussion, I ignore it in most of the discussion for simplicity. Similarly, with space, there is a proportionality constant--how many bits or bytes does it take to record one student-course connection or one bridge in the island-graph example.

As an aside not relevant to my discussion but relevant to the entire discussion, I just saw a news article on storing JSON on binary. https://devclass.com/2024/01/16/sqlites-new-support-for-bina...