|
|
|
|
|
by bluecoconut
125 days ago
|
|
Fun read. One upside of the deterministic schemes is they include provenance/lineage. Can literally "trace up" the path the history back to the original ID giver. Kinda has me curious about how much information is required to represent any arbitrary provenance tree/graph on a network of N-nodes/objects (entirely via the self-described ID)? (thinking in the comment: I guess if worst case linear chain, and you assume that the information of the full provenance should be accessible by the id, that scales as O(N x id_size), so its quite bad. But, assuming "best case" (that any node is expected to be log(N) steps from root, depth of log(N)) feels like global_id_size = log(N) x local_id_size is roughly the optimal limit? so effectively the size of the global_id grows as log(N)^2? Would that mean: from the 399 bit number, with lineage, would be a lower limit for a global_id_size be like (400 bit)^2 ~= 20 kB (because of carrying the ordered-local-id provenance information, and not relative to local shared knowledge) |
|
Provenance is a DAG, so you get a partial order for free by topological sort. That can be extended to a compatible total order. Then provenance for a node is just its ordering. This kind of mapping from objects to the first N consecutive naturals is also a minimal perfect hash function, which have n log n overhead. We can't navigate the tree to track ancestry, but equality implies identical ancestry.
Alternatively, we could track the whole history in somewhat more bits with a succinct encoding, 2N if it's a binary tree.
In practice, deterministic IDs usually accept a 2^-N collision risk to get log n.