Hacker News new | ask | show | jobs
by CaptArmchair 2293 days ago
I started reading the article with much interest... up until the bit about the Semantic Web. Then I felt things went downhill.

> One such effort was the Semantic Web. The dream was to create a Resource Description Framework (editorial note: run away from any team which seeks to create a framework), which would allow metadata about content to be universally expressed. For example, rather than creating a nice web page about my Corvette Stingray, I could make an RDF document describing its size, color, and the number of speeding tickets I had gotten while driving it.

> This is, of course, in no way a bad idea. But the format was XML based, and there was a big chicken-and-egg problem between having the entire world documented, and having the browsers do anything useful with that documentation.

The author completely falls short to describe the evolution of the SemWeb over the past 10 years. Tons of specs, several declarative languages and technologies have been grown to not just get beyond the verbosity of a serialization format such as XML, but also move away from the classic relational data model.

Turtle, JSON-LD, SPARQL, Neo4J, Linked Data Fragments,... come to mind. And then there are the emerging applications of linked data. If anything, the Federated Web is exactly about URLs and semantic web technologies based on linking and contextualizing data.

The entire premise of Tim Berner Lee's Solid/Inrupt is based on these standards including URI's.

Linked data and federation isn't just about challenging social media, it's also about creating knowledge graphs - such as wikidata.org - and creating opportunities for things such as open access and open science.

Then there's this:

> httpRange-14 sought to answer the fundamental question of what a URL is. Does a URL always refer to a document, or can it refer to anything? Can I have a URL which points to my car?

> They didn’t attempt to answer that question in any satisfying manner. Instead they focused on how and when we can use 303 redirects to point users from links which aren’t documents to ones which are, and when we can use URL fragments (the bit after the ‘#’) to point users to linked data.

Err. They did.

That's what the Resource Description Framework is all about. It gives you a few foundational building blocks for describing the world. Even more so, URI's have absolutely NOTHING to do with HTTP status codes. It just so happens that HTTP leverages URI's and creates a subset called HTTP URL's that allows the identification and dereference of webbased resources.

You can use URI's as globally unique identifiers in a database. You could use URN's to identify books. For instance urn:isbn:0451450523 is an identifier for the 1968 novel The Last Unicorn.

So, this is a false claim. I could forgive them for inadvertently not looking beyond URL's as a mechanism used within the context of HTTP communication.

> In the world of web applications, it can be a little odd to think of the basis for the web being the hyperlink. It is a method of linking one document to another, which was gradually augmented with styling, code execution, sessions, authentication, and ultimately became the social shared computing experience so many 70s researchers were trying (and failing) to create. Ultimately, the conclusion is just as true for any project or startup today as it was then: all that matters is adoption. If you can get people to use it, however slipshod it might be, they will help you craft it into what they need. The corollary is, of course, no one is using it, it doesn’t matter how technically sound it might be. There are countless tools which millions of hours of work went into which precisely no one uses today.

I'm not even sure what the conclusion is here. Did the 'hyperlink' fail? did the concept of a 'URI' fail? (both are different things!) Because neither failed, on the contrary!

Then there's this wonky comparison of the origin of the Web with a single project or a startup. The author did the entire research on the history of the URI but they still failed to see that the Internet and the Web were invented by committee and by coincidence. Pioneers all over the place had good ideas, some coalesced and succeeded, others didn't. Some were adapted to work together in a piece-meal fashion such as Basic Auth.

And that's totally normal. Organic growth and distribute development is the baseline. Yes, the Web as we know it today is the result of many competing voices, but at the same time it could only work if everyone ended up agreeing over the basics.

The fact of the matter is that some companies - looking at you FAANG - would rather have us all locked in a closed, black-box ecosystems, rather then having open standards around that allow for interoperability, and thus create opportunities for new threats to challenge their business interests.

I understand that the article is written by CloudFlare, a CDN company with its own interests. But I'm trying to wrap my ahead around how the author failed in addressing exactly future opportunities and threats, after this entire exposé.

4 comments

I understand that the article is written by CloudFlare, a CDN company with its own interests.

Not sure what you mean by that, but Zack wasn't writing something to further some secret interest Cloudflare has in the structure of URLs.

It's always prudent to be circumspect about the motives of writers of what is essentially marketing content.
I think the operative phrase is 'a CDN company'.

URLs are names for things (companies, mailboxes, pictures of cats), but they're also (encoded) directions to get representations of those named things.

CloudFlare is concerned with the mechanics mostly, the latter. Things like Wikidata, knowledge bases, schema.org are interested in the former perspective.

Cloudflare has Edge Workers and lots of other technologies that would be helpful to the semantic web.

Anything that is URL addressable is great for Cloudflare.

The author completely falls short to describe the evolution of the SemWeb over the past 10 years. Tons of specs, several declarative languages and technologies have been grown to not just get beyond the verbosity of a serialization format such as XML, but also move away from the classic relational data model.

Turtle, JSON-LD, SPARQL, Neo4J, Linked Data Fragments,... come to mind. And then there are the emerging applications of linked data. If anything, the Federated Web is exactly about URLs and semantic web technologies based on linking and contextualizing data.

I think that the author addressed this very well:

There is a popular perception that the internet standards bodies didn’t do much from the finalization of HTTP 1.1 and HTML 4.01 in 2002 to when HTML 5 really got on track. This period is also known (only by me) as the Dark Age of XHTML. The truth is though, the standardization folks were fantastically busy. They were just doing things which ultimately didn’t prove all that valuable.

One such effort was the Semantic Web.

Most of the things you listed were developed in that period. I'll make a partial exception for JSON-LD because - as the author of that standard himself says:

So screw it, we thought, let’s create a graph data model that looks and feels like JSON, RDF and the Semantic Web be damned.

and

I hate the narrative of the Semantic Web because the focus has been on the wrong set of things for a long time.

[1] http://manu.sporny.org/2014/json-ld-origins-2/

It's fair to say that there's a vast difference between assessing the usefulness of the technical output in this day and age on the one hand, and looking at past context - the decision processes, incumbents, power dynamics,... - in which that output was established.

> There is a popular perception that the internet standards bodies didn’t do much from the finalization of HTTP 1.1 and HTML 4.01 in 2002 to when HTML 5 really got on track. This period is also known (only by me) as the Dark Age of XHTML.

I think that's hindsight bias talking.

Who knew at the time how the next 20 years would play out. Google was just in it's infancy. Internet Explorer dominated the browser market and the same concerns - vendor lock-in and proprietary protocols - were just as much a thing back then as they are today.

HTML5 could emerge because of the wide adoption of XHTML and web standards by developers and designers. Not despite the existence of XHTML. The latter is just heavily colored value attribution on the part of the author.

> The truth is though, the standardization folks were fantastically busy. They were just doing things which ultimately didn’t prove all that valuable.

I think this applies to literally any sizable enterprise as rising complexity diminishes predictability. The only way to find out whether or not a complex enterprise is valuable is... by going down that road and test your ideas.

The implication made here is a take against standards bodies not following market dynamics - doing market research; following dominant technologies - but instead impose their own principled vision on a market.

But that's a false dichotomy. If anything, standards bodies are committees in part made up of people who are also affiliated or represent incumbents in the marketplace. And in part they are made up of people who defined interests groups outside of commercial ventures such as academia, research, public governance, and so on.

The output of a standards body is by very definition a compromise that doesn't tailor the specific needs and wants of a single actor. That's actually a good thing.

> I hate the narrative of the Semantic Web because the focus has been on the wrong set of things for a long time.

The author is correct. The RDF spec has a lot of shortcomings. And the Semantic Web discussion was a difficult debate for a long time because things hadn't coalesced in a clear vision. And that's not a bad thing.

Context matters. At that time, nobody knew what the SemWeb was supposed to become or into what it would evolve. It was simply an idea and there were a few tacit attempts to work in a problem space that wasn't fully charted yet. It's hard to navigate if you don't know the lay of the land, right?

This blogpost was written when the final recommendation of JSON-LD was published. And that specification could only emerge after it was clear that the direction of the debate wasn't leading to nowhere.

All I see is a normal evolution of things in an R&D context. By the same token, you could argue that the telegraph was a useless device because usage declined and nobody is using that technology anymore. But then you'd disregard the fact that the existence and use of telegraphs inspired others to create improvements such as the telephone or the radio.

All I see is a normal evolution of things in an R&D context.

Which is fine, except the premature standardization approach used by Semantic Web technologies destroyed any chance they had of working.

HTML5 could emerge because of the wide adoption of XHTML and web standards by developers and designers. Not despite the existence of XHTML. The latter is just heavily colored value attribution on the part of the author.

Actually, no. HTML5 forked from HTML4, not XHTML because the W3C had a different vision.

This isn't the just authors view: I followed the mailing list and it's pretty well understood.

See for example the "A Competing Vision" in https://diveinto.html5doctor.com/past.html

Thanks for reading and for your careful analysis. My perspective lives in the last paragraph of the post and I'll let that stand.
>> They didn’t attempt to answer that question in any satisfying manner. Instead they focused on how and when we can use 303 redirects to point users from links which aren’t documents to ones which are, and when we can use URL fragments (the bit after the ‘#’) to point users to linked data.

> Err. They did.

> That's what the Resource Description Framework is all about. It gives you a few foundational building blocks for describing the world. Even more so, URI's have absolutely NOTHING to do with HTTP status codes. It just so happens that HTTP leverages URI's and creates a subset called HTTP URL's that allows the identification and dereference of webbased resources.

> You can use URI's as globally unique identifiers in a database. You could use URN's to identify books. For instance urn:isbn:0451450523 is an identifier for the 1968 novel The Last Unicorn.

> So, this is a false claim. I could forgive them for inadvertently not looking beyond URL's as a mechanism used within the context of HTTP communication.

See this is almost the canonical example of why the semantic web remains the once and always future of the web.

Take this: Even more so, URI's have absolutely NOTHING to do with HTTP status codes. It just so happens that HTTP leverages URI's and creates a subset called HTTP URL's that allows the identification and dereference of webbased resources.

Sure, URIs are just the addressing scheme. I think we all get that. But the practicalities of building systems means that the applications have to understand both the addressing scheme, and some way of handling errors which status codes supply. Notably all implementations of URIs (HTTP, Files, IPFS) have to implement error handling themselves.

The holistic approach that the (non-semantic) web took in evolving the browser, HTML, and HTTP together meant that practical applications could be built on it.

Contrast that to the ideological approach of the semantic web, where - yes, Resource Description Framework (RDF) gives you addresses, but it's a weak data modelling approach that would be ignored if it was in a programming language (eg, the lack of list support! - see [1])

Anyway, to go to your original point: the original httpRange-14 was in the context of HTTP URIs, but the issue equally applies to non-HTTP URIs. At least for HTTP we can discuss it sensibly because status codes are part of the spec. For URIs in a general sense it seems impossible to resolve this (no pun intended).

[1] See Decision 3 in http://manu.sporny.org/2014/json-ld-origins-2/ (or read the whole article. It's good).