| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by xamde 955 days ago

> The document is primarily intended for the following audiences:

    Software developers that want to implement an RDF dataset canonicalization algorithm.
    Masochists.

Pick your side.

2 comments

no_wizard 955 days ago

In case anyone is wondering, its really what it says:

https://www.w3.org/TR/2023/CR-rdf-canon-20231031/#how-to-rea...

link

oever 955 days ago

I've yet to see a convincing use-case to canonicalize RDF graphs.

The document cites these use-cases:

There are different use cases where graph or dataset canonicalization are important:

    Determining if one serialization is isomorphic to another.
    Digital signing of graphs (datasets) independent of serialization or format.
    Comparing two graphs (datasets) to find differences.
    Communicating change sets when remotely updating an RDF source.

These are not real-world use-cases. Why would one want to sign independent of serialization or format? The real-world need is that people start signing graphs. But why would they sign some abstract format that is independent of serialization format? That supposedly independent format is a format too and will have competition soon. It's the way of the world: fork, fork fork.

I'm signing my RDF graphs as bytearrays with PGP and avoid all the hassle.

link

slaymaker1907 955 days ago

I assume that serialization formats might reference this standard so that they don’t need to reinvent the wheel that is graph normalization.

> A canonicalization algorithm is necessary, but not necessarily sufficient, to handle many of these use cases.

It’s kind of like how there’s a standard for structured copy of JS objects that gets used for things like the web worker spec.

Signing something independent of serialization might be useful since then the exact serialization format can vary. For example, maybe the data is already serialized using SQLite. I’d prefer to avoid loading the data into memory and reserializing it just to check the signature. Instead, it’d be nice to just canonicalize it and then utilize the indexing capabilities of SQLite to minimize memory usage.

link

oever 954 days ago

So the use-case is a to a very tentative optimization. This tentative optimization is achieved by introducing a very complicated algorithm that is not guaranteed to run in finite time.

You could also check signatures when loading the data and keep the original bytearray separately in slow/cheap storage.

That way you can sign RDF graphs like you sign any bytearray and keep a simple design.

link

nextaccountic 955 days ago

when you sign an RDP graph as a bytearray, how do you cope with the fact that multiple bytearrays serialize the same graph?

link

oever 954 days ago

I don't. Why would you want to cope with that? When does it matter that multiple bytearrays give same graph?

link

hobofan 954 days ago

I used RDF canonicalization in a system that built a computation graph system where the inputs and outputs to a computation were one or multiple RDF graphs.

Many of the computations were doing things like inference that created new blank nodes, and were also doing so in a non-determinstic order, and at the same time many computations created structurally identical outputs (with a low cardinality of triples). By using RDF canonicalization as the basis for content addressing those small graphs, it became quite easy to avoid re-doing a lot of the computations that would have happened due to non-deterministic order. For larger graphs we just used a hash of the native serialization, as re-doing the computation was cheaper than trying to canonicalize.

Adding that canonicalization-based system gave the whole system a significant performance boost, so yeah, there are some scenarios where you "would want to cope with that".

link