| (disclaimer: I am an author of the paper) Thanks for your comments. First off: yes, most (perhaps all) of the applied methods are not novel, some of them have been around for a long time. We only claim novelty on how these existing methods are combined to solve the problem of data availability and integrity on the web. Yes, the magnet URI scheme is highly related, and we probably should have referred to it in one way or another. However, there are crucial features that magnet links do not provide (as far as I know): you cannot generate a hash that represents content on a more abstract level than byte sequences (MIME types by themselves don't solve that problem), and you can also not have self-references. All of the features from our list of requirements are supported by some approaches, but (to our knowledge) no approach supports all of them at the same time. In terms of search engines caching research data, I agree! We shouldn't trust existing providers too much but build a dedicated decentralized infrastructure for scientific purposes (this is what I am working on now). I am sure the performance measures can be improved (incremental cryptography might allow us to get rid of sorting altogether). The shape of the curve is however not much affected by the fact whether the statements are already sorted or not (they are not sorted for TransformRdf and TransformLargeRdf!). I hope this clarifies some things. |
But, I don't think I understand your concern about abstract hashing and how it would need to be something fundamentally new. Both the order normalization and self-reference are simply preprocessing stages on your data, albeit slightly different forms. The sortedness requirement, I think, is captured by MIME type parameters (the "charset=" in "text/html;charset=UTF-8"), as it does not change the fact that the document is an RDF graph. For the placeholder trick, I think you're right and that you'd want something like a "text/rdf+selfref" MIME type to indicate that it is not in fact valid RDF until preprocessing has been performed. All told, your RDF module would be described in MIME as something like "text/rdf+selfref;sorted=".