|
|
|
|
|
by PaulHoule
988 days ago
|
|
There are "graph databases" which see graphs as a universal approach to data, see RDF and SPARQL and numerous pretenders. For that matter, think of a C program where the master data structure is a graph of pointers. In a graph like that there is usually a huge number of different edge types such as "is married to", "has yearly average temperature", ... Then there are "graph algorithms" such as PageRank, graph centrality, and such. In a lot of those cases there is one edge type or a small number of edge cases. There are some generic algorithms you can apply to graphs with many typerd edges edges such as the magic SPARQL pattern ?s1 ?p ?o .
?s2 ?p ?o .
which finds ?s1 and ?s2 that share a relationship ?p with some ?o and is the basis for a similarity metric between ?s1 and ?s2. Then there are the cases that you pick out nodes with some specific ?p and apply some graph algorithm to those.The thing about graphs is, in general, they are amorphous and could have any structure (or lack of structure) at all which can be a disaster from a memory latency perspective. Specific graphs usually do have some structure with some locality. There was a time I was using that magic SPARQL pattern and wrote a program that would have taken 100 years to run and then repacked the data structures and discovered an approximation that let me run the calculation in 20 minutes. Thus practitioners tend to be skeptical about general purpose graph processing libraries as you may very have a problem that I could code up a special-purpose answer to in less time than you'll spend fighting with the build system for that thing that runs 1000x faster. ---- If you really want to be fashionable though, arXiv today is just crammed with papers about "graph neural networks" that never seem to get hyped elsewhere. YOShInOn has made me a long queue of GNN papers to look at but I've only skimmed a few. A lot of articles say they can be applied to the text analysis problems I do but they don’t seem to really perform better than the system YOShInOn and I use so I haven’t been in a hurry to get into them. |
|
A graph of typed pointers. As you likely know, the basic element of RDF is not “foo has a relationship with bar”, but “foo has a relationship with bar of type baz”.
Also, the types themselves can be part of relationships as in “baz has a relationship with quux of type foobar”
> The thing about graphs is, in general, they are amorphous and could have any structure (or lack of structure) at all which can be a disaster from a memory latency perspective
But that’s an implementation detail ;-)
In theory, the engine you use to store the graph could automatically optimize memory layout for both the data and the types of query that are run on it.
Practice is different.
> Thus practitioners tend to be skeptical about general purpose graph processing libraries
I am, too. I think the thing they’re mostly good for is producing PhD’s, both on the theory of querying them, ignoring performance, and on improving performance of implementations.