Hacker News new | ask | show | jobs
by j-pb 1459 days ago
> People get seduced by specifications that don't really specify anything.

Like the RDF 1.1 Spec?[https://www.w3.org/TR/rdf11-concepts/]

The whole "abstract syntax" shenanigans that RDF pulls is one of its biggest flaws. It makes the entire ecosystem huge and unwieldy, and has little upsides besides giving everybody their favourite serialisation flavour.

It makes things like canonical representations for content addressable hashing and singing pretty much impossible, which is a huge detriment to proper authentication and provenance tracking.

It also pulls in all of these other open ended standards, where everything and anything is a valid subject identifier, so long as it's URI resolvable by http (which is pretty vague and random).

Subjects and predicates should have always just been 16byte random UIDs, which at the same time would have delivered us from the bane of blank nodes, endless discussions on predicate names, and broken links.

The object part should also have been limited in the amount of data it can hold, just hash anything bigger and store it in some form of content addressable blob store.

2 comments

> so long as it's URI resolvable by http

It doesn’t have to be. URI is a pretty broad concept and URLs are just a subset. It’s perfectly fine to identify an entity with other URIs that are not URLs. If for instance you’re talking about a book then you could use the ISBN for instance “ urn:isbn:0-486-27557-4”

The usefulness of using resolvable URL’s as URIs is just that if you have absolutely no knowledge about the resource except it’s URI, and that URI happens to be a resolvable URL, then at least you know where to go looking to find out more.

The URI resolution idea is 99% crap.

That is, most of the time you don't want to publish subjects and predicates as resolvable URIs. However, people see so many examples of http:... that they don't release it's even possible to make non-resolvable URIs.

I used random UUIDs all the time but that is a super-fraught area since some people really want them to be in temporal sequence so their database index is happy.

I've also done the content addressable blob store thing.

I've had some pretty good experiences with using the following random UID scheme.

32 bit millisecond timestamp that just rolls over, i.e. truncate(current_time_ms()), concatenated with 96 bytes of crypto grade entropy.

You get both nice properties, database index locality + proper entropy that you can sleep well and not worry about collisions (since the entire timestamp space will get more densely populated with every overflow (roughly every 50 days)).

It's also what PostgreSQL uses for it's index friendly UID format. :D

I think proposed UUID v7 is sortable, FWIW.

https://www.ietf.org/archive/id/draft-peabody-dispatch-new-u...