Hacker News new | ask | show | jobs
by PaulHoule 1459 days ago
People get seduced by specifications that don't really specify anything.

Conversely there is a lot of pushback against W3C standards because they are specific and unfortunately people don't see that at freedom (freedom to choose tools that interoperate) and don't see the slavery in being stuck with poorly specified "standards" that are controlled by one entity.

GraphQL is improved (we almost know what the algebra is) but originally it was an asymmetric specification meant to keep power in the hands of Facebook.

That is, they didn't want to specify what the exact rules for traversing the graph are because they have commercial reasons for controlling what information you can get plus some responsibility to protect user's privacy.

Schema.org was another asymmetric standard, as it wasn't particularly good for exposing semantic metadata that people could consume as such (it took me a few years to really figure out how) but it was great for companies like Google to make a training set that would ultimately let them extract entities from documents that aren't marked up. It achieved some popularity because there is no limit to the hoops people will jump through if it improves their SERP rating from 78 to 55.

1 comments

> People get seduced by specifications that don't really specify anything.

Like the RDF 1.1 Spec?[https://www.w3.org/TR/rdf11-concepts/]

The whole "abstract syntax" shenanigans that RDF pulls is one of its biggest flaws. It makes the entire ecosystem huge and unwieldy, and has little upsides besides giving everybody their favourite serialisation flavour.

It makes things like canonical representations for content addressable hashing and singing pretty much impossible, which is a huge detriment to proper authentication and provenance tracking.

It also pulls in all of these other open ended standards, where everything and anything is a valid subject identifier, so long as it's URI resolvable by http (which is pretty vague and random).

Subjects and predicates should have always just been 16byte random UIDs, which at the same time would have delivered us from the bane of blank nodes, endless discussions on predicate names, and broken links.

The object part should also have been limited in the amount of data it can hold, just hash anything bigger and store it in some form of content addressable blob store.

> so long as it's URI resolvable by http

It doesn’t have to be. URI is a pretty broad concept and URLs are just a subset. It’s perfectly fine to identify an entity with other URIs that are not URLs. If for instance you’re talking about a book then you could use the ISBN for instance “ urn:isbn:0-486-27557-4”

The usefulness of using resolvable URL’s as URIs is just that if you have absolutely no knowledge about the resource except it’s URI, and that URI happens to be a resolvable URL, then at least you know where to go looking to find out more.

The URI resolution idea is 99% crap.

That is, most of the time you don't want to publish subjects and predicates as resolvable URIs. However, people see so many examples of http:... that they don't release it's even possible to make non-resolvable URIs.

I used random UUIDs all the time but that is a super-fraught area since some people really want them to be in temporal sequence so their database index is happy.

I've also done the content addressable blob store thing.

I've had some pretty good experiences with using the following random UID scheme.

32 bit millisecond timestamp that just rolls over, i.e. truncate(current_time_ms()), concatenated with 96 bytes of crypto grade entropy.

You get both nice properties, database index locality + proper entropy that you can sleep well and not worry about collisions (since the entire timestamp space will get more densely populated with every overflow (roughly every 50 days)).

It's also what PostgreSQL uses for it's index friendly UID format. :D

I think proposed UUID v7 is sortable, FWIW.

https://www.ietf.org/archive/id/draft-peabody-dispatch-new-u...