Hacker News new | ask | show | jobs
by ylow 842 days ago
I think this is because a graph is not a data-structure nor a data-type. It is really an abstraction.

Fundamentally, all I need to define a graph is a set of vertices v \in V and function Neighbors(v). And that really is all is needed for the most foundational set of graph algorithms.

Everything else are case-by-case constraints. Does A->B imply B->A? is the node set partitionable with certain constraints? Are there colors? labels?

To make things even more general I can go up one level and consider the hypergraph. In which case I just have a set of vertices, and a set of sets of vertices. This can be represented in a myriad of different ways depending on what you are interested in. Of which (non-hyper) graph is simply a special case.

An alternative way to think about it perhaps from the database perspective, is that its a query optimization and indexing problem. Depending on what questions you want to ask about the graph, there will be different ways to represent the graph to answer the question better. Just like there is not one way to represent the abstraction called "Table", there is not one way to do "Graph" either. It really depends on the questions you care about.

3 comments

Yes, graphs are ubiquitous because they are so abstract. They live on the same level of abstraction as pure numbers. There are useful "numerical" libraries that exist, and by analogy I think you could say there are also useful "graphical" libraries that exist. But we don't really have "number" libraries, and we don't really have "graph" libraries, because those concepts are a bit too abstract to write APIs against.
it's true that numbers are very abstract, which is what makes it so easy to design apis for them

the python runtime includes four built-in number types (small integer, arbitrary-precision integer, float, and complex) and the python standard library includes two more number types (decimal and fractions), and one of the most popular non-standard libraries for python is numpy, which provides some other kinds of numbers such as single-precision floats, vectors, and matrices. other systems like pari/gp have number libraries that provide other kinds of numbers, such as p-adic numbers and galois field elements

the only programming languages i've ever used that didn't have 'number' libraries were esoteric languages like brainfuck and the lambda calculus

numbers have all of mathematics as a background which is what makes it so easy to design apis for them

graphs are a much newer development, I think there's a very deep connection between category theory and graphs in general (and also computers make both much more useful somehow)

lambda calculus can be used to define numbers but it's a wonky construction, it's reminiscent of how sets can also be used to define numbers.

that seems reasonable
> Fundamentally, all I need to define a graph is a set of vertices v \in V and function Neighbors(v).

Even that is severely overconstrained. It doesn't allow multiple edges to the same neighbor!

Maybe the naming would be a little weird for that use case, but they didn't specify what the output of `Neighbors(v)` is; I don't see any reason why it couldn't return a multiset or a relation (w, c) where `c` is the number of connections between `v` and `w`
Returning the same neighbor multiple times kind of misses the point. The point was that you need to return edges (not neighbors) because the edges connecting the same neighbors can be different.

Like, imagine two train lines between the same pair of stations. Or two roads between the same intersections. They might have different travel times, costs, etc.

I still don't see how this couldn't work with a `Neighbors(v)` function with an unspecified return type. The outputs I gave were examples of how it could be adapted for various use cases, not an exhaustive list of all possibilities; in the example with multiple edges with multiple weights, the output could instead be a relation of (v2, w, c) that indicates that v connects to v2 with weight w with c as the number different edges between the two with that weight.
In all of your earlier examples (and actually, including the current one too), you're treating vertices as first-class objects, but edges as second-class. There's no way to even identify an edge in your formulation -- only vertices.

This is a common oversight in graph structures that ends up being very annoying in many applications. You keep trying to work around it by associating the edge's properties with those of the vertex pairs and hoping that's sufficient for the application, but I'm trying to point out that the abstraction you're implicitly dancing around -- and the one that many practical uses need -- is actually one that treats edges as first-class. In fact, I would go further and say that if anything should be second-class, it ought to be the vertices, since they're already implied by the edges. (That is to say, for many practical applications of graphs, an edge determines its endpoints, but the endpoints don't determine the edge.)

I'm not clear what an abstraction that makes edges first class but vertices second class looks like. An edge connects two vertices, so if two edges connect to the same vertex, what does this look like?
Well to be fair, that constraint is also part of the mathematical definition of a graph, where the set of edges E is a binary relation over vertices V (i.e., a subset of V x V). You'd need either a multiset or a labelled relation (i.e., a subset of V x L x V for some set of labels L) to overcome that.
There is no "the" definition. From Wikipedia:

"Definitions in graph theory vary. [...] A multigraph is a generalization that allows multiple edges to have the same pair of endpoints. In some texts, multigraphs are simply called graphs."

It's a bit disingenuous to skip over the Graph subsection of that article, right after the "definitions vary" line:

> A graph (sometimes called an undirected graph to distinguish it from a directed graph, or a simple graph to distinguish it from a multigraph) is a pair G = (V, E), where V is a set whose elements are called vertices (singular: vertex), and E is a set of unordered pairs of vertices, whose elements are called edges (sometimes links or lines).

An unqualified "graph" is almost always this one—a simple, undirected graph. If you mean something different you almost always need to use one of the more specific names to be clear.

https://en.m.wikipedia.org/wiki/Graph_(discrete_mathematics)...

"Disingenuous"? Do you have to make this personal before looking for another explanation?

The parent comment I replied to said:

"Does A->B imply B->A?"

That "undirected" condition was already violated before I wrote anything.

Sorry, I didn't intend to make it personal, I was just pointing out that the very next paragraph after the chunk you quoted included the definition of "graph" that lou1306 was referring to, almost verbatim.

Definitions sometimes vary, but lou1306 is correct on the merits that the most widely accepted definition of an unqualified "graph" states that "the set of edges E is a binary relation over vertices V (i.e., a subset of V x V)".

> hypergraph

> I just have a set of vertices

> and a set of sets of vertices

Sounds kind of like a file system to me. Files are the vertices. Directories are the nestable sets of vertices.