Hacker News new | ask | show | jobs
by yummyfajitas 5585 days ago
In a connected graph, you don't want to cache this [1]. It will take up O(N^2) space. If you have 100k nodes, that would require 10^10 cached edges.

Further, insertion time becomes worst case O(N^2). Consider a graph with two connected components (each having O(N/2) nodes). Once you add an edge that connects them, updating the view will require O(N^2/4) operations - you need to connect every node in the first component to every node in the second.

A graph database is more specialized than SQL, so they can perform a very simple optimization. You don't scan the database, you just follow links. Links are typically stored with the node, so it's O(1) time.

[1] In a connected graph, there is a constant time algorithm - return True. But that won't work if you also want edge distance.

1 comments

You're right about caching. I wasn't sure if graph databases took the typical NoSQL approach of massive non-normalized tables to avoid computational overhead.

SQL doesn't scan either -- tables are indexed via either a hash (O(1)) or a tree (O(log edges)). While I do realize the speed advantage of a specialized structure, I feel this could be better implemented as a specialized index/storage mechanism in SQL than as an entirely new database system.