|
All data structures can be represented as graphs. I use the term "graph" for
a collection of nodes (dots) and edges. (The rest of this paragraph
introduces this concept of graph, as per
this definition, for
those not familiar with it.)
Imagine a set of islands connected
by bridges; the nodes are the islands; the bridges are the edges.
[1] Seven Bridges of Konigsburg, Wikipedia,
https://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsbe...
Graph Theory,
https://discrete.openmathbooks.org/dmoi3/ch_graphtheory.html
A
different kind of example is a conflict graph;
the program reads a set of courses, a set
of students, and for each student, the courses that
student wants to take;
the nodes would be courses; every time two or more students
want to take the same two courses,
the program creates an edge between those two courses. [2]
[2] Runa Ganguli and Siddhartha Roy,
"A Study on Course Timetabling based on the Graph Coloring Approach"
International Journal of Computational and Applied Mathematics,
Volume 2, No 17, 469-485. A computer program would process this graph to schedule the courses
so there are no conflicts or few conflicts. In other words, it would
try to satisfy as many students as possible.
This is in contrast with the term "graph" that one saw in high school or
junior high school; that represents a function. An
example would be the line chart where the height of a child
is on the y-axis with their age on their x-axis and a point representing
each time their height was taken and with lines connecting one height data
point to another. All data structures can be represented as graphs.
For example, a hypergraph can be represented as a graph where each hyperedge corresponds to a node and connected to the nodes to which the hyperedge.
Objects in a mechanical engineering CAD system
or graphics display
system are often kept as
the winged-edge configuration. That is, we know for each edge, the adjacent faces and for each faces, the edges.
Thus, the face is a "hyperedge" with
the edges in the diagram being
the nodes. [3]
[3] https://en.wikipedia.org/wiki/Winged_edge
Stanford Technical Report STAN CS 320
Bruce G. Baumgart, "Winged Edge Polyhedron Representation"
http://i.stanford.edu/pub/cstr/reports/cs/tr/72/320/CS-TR-72...
Charles Eastman and Kevin Walter
Geometric Modeling Using the Euler Operators , Carnegie
Mellon University DRC 15-279, May 1979 Of course, any graph can be serialized.
Often, that would be done
in JSON or XML.
ChatGPT tells me that
the time to serialize a graph is O(V+E) for adjacency lists and O(V^2) for
adjacency matrices.
That is, any data structure represented
as a collection
of pointers
can be converted into text in time
linear to the amount of information
in the data structure.
Adjacency matrices are used when we want to quickly see whether one entity
is connected to another; but it is at the cost of space and time to serialize. Assume one is tracking which students are taking (or are interested in taking)
which course. In the computer, the programmer can put this into a rectangular
array of size CS where C is the number of courses and S is the number of
students. When dumped into text naively into text, this would take space
and time writing to disk
proportional to CS. On the other hand, assume that this is a sparse
array; on average, each student is only interested in taking 10 courses.
We can represent this as a list of average size 10 for each student, or
time 10S. (Or more precisely, 10S+C.)
We call in computer science, the relation between students and courses as
a many-many relationship. See also: [4]
https://stackoverflow.com/questions/51783/how-to-serialize-a... That is the power of sparsity, it reduces the time from a product to a linear
function. (The classic graph is a many-many relation
of something to itself. That is, which island is connected to another
island by a bridge, which course is connected to another one by a
student interested in both, or which city is connected to another city
by a direct flight.) The average number of connections for
each entity to another entity is the sparsity, m.
Thus, the time to write the data for a sparse representation is
represented by mN where N is the number of entities (or nodes). By a little verbal sleight of hand, we say that the many-many relationship
of students courses is a graph where some nodes are labled "student" and others
are represented "course." Throughout the above discussions, I ignore the
constant which is the time to write one connection to
the file; in this discussion, I ignore it
in most of the discussion for simplicity.
Similarly, with space, there is a proportionality constant--how many bits
or bytes does it take to record one student-course connection or one bridge
in the island-graph example. As an aside not relevant to my discussion but relevant to
the entire discussion, I just saw a news article on storing JSON on binary.
https://devclass.com/2024/01/16/sqlites-new-support-for-bina... |