Hacker News new | ask | show | jobs
by AndrewStephens 2376 days ago
I have some low-level hate for the Semantic Web. I run a small personal blog that I maintain using a relatively simple static site generator that I created that turns markdown files into clean(ish) html.

A couple of months ago I got interested in adding semantic information to my posts so I modified the generator to add some of the common semantic tags. It was an annoying job, since the semantic information pollutes the structure of the html.

Can anyone tell me what the semantic web does for me as a small-time publisher? Is it for search engines? Does it really matter that a book review (for instance, I have a few) is tagged properly?

7 comments

> Can anyone tell me what the semantic web does for me as a small-time publisher? Is it for search engines?

Yes, in practice it is mostly for bigger fish in the pond to easily identify and steal your content as needed.

For example, Google was using reviews from small competitors' sites in Google Shopping.

I think this is one of the big issues. The semantic information does make it easier for end users to find what they're looking for, but it also made denial of traffic possible.

In a lot of cases, the information was there to get eyeballs--so this is undesirable.

I guess if you don't really care about the eyeballs it can be "useful" for the big fish to pay most of the cost of serving the fraction of your server response that the end user was looking for...

So the root problem is actually that people care about the eyeballs. Nothing good comes from such incentive.
Maybe. Not sure what I think about that framing.

FWIW, I was picking "eyeballs" as something wider than just ad revenue. I think ads are the big share, but I'm sure there are people/orgs who want eyeballs for other reasons like ego/status, promote their company/brand/service/products, etc.

In some sense I think your framing is accurate, but I don't know about whether we'd be better off (have an informational ecosystem that is more net-positive?) without status chasers. Some share of them are inevitably gaming the system and diluting the ecosystem; others probably add net value in pursuit of eyeballs?

In context of semantic web, pursuit of the eyeballs is a problem because it makes the people owning/creating the data also want to be delivering that data directly to the users, and be the only ones allowing to do so. Semantic web works for the opposite goal - to allow the data to be automatically transmitted, processed and understood by software, and only perhaps eventually delivered in some form to end user.

As for building more net-positive information ecosystems, going for the eyeballs instead of actually caring to deliver good information isn't necessarily bad per se, just suboptimal. It's better for an eyeball-chasing site to publish some information, if otherwise that information wouldn't be published at all. But it's the eyeballs being your primary revenue source that will make you work hard to make the data as useless as possible outside your own publication - which leads to a very unhealthy information ecosystem.

They don't even care about that. They care about their advertising revenue.
More a side note, but if you run a blog you might know that the trackback url can be specified via a RDF tag. That's a kind of semantic information, one example for one type of usage: Given other clients (here: other blogs) additional information (here: where to send the Trackback POST).

The markup you added - it depends on what exactly you did. Did you add the markup for schema.org? That's in practice solely for Google. The SEO promise there is that Google will make use of the information provided and format some information nicely, which can lead to more clicks. https://moz.com/learn/seo/serp-features explains that not badly. For things like reviews I can imagine it to be quite useful.

> Does it really matter that a book review (for instance, I have a few) is tagged properly?

If the semantic web was better supported, you could have a semantic annotation precisely identifying the books you are reviewing (whether by ISBN edition or otherwise), and reusers of your content (users, search engines or others) would be able to programmatically associate your review with similar content.

That seems like it would be abused to the point of the semantic information being completely useless.
I guess by abuse you mean ~black-hat SEO?

It seems likely (and perhaps obvious) that:

- people will try to abuse it

- abuse will keep it from supporting naive trust of semantic information published by untrusted third parties

But we're also already roughly in this scenario, and it seems like it might be easier to model and spot/discard abuse of semantic information.

> It was an annoying job, since the semantic information pollutes the structure of the html.

In what way? Both the html and the metadata is intended to make your website machine-friendly. You may find the html structure polluted, but crawlers would find it more informative.

Embedding semantic information would allow Google to further refine search traffic to your web page. I assume it may also make you more authoritative wrt to the content you publish.
"Semantic Web" is a wide area. What technologies did you use? Care to post a little example, as to what and how it pollutes the HTML structure?
I can't imagine what semantic tags would pollute a blog's markup as most of the semantic tags were designed to structure simple text content like a blog post. Do you have any examples?

> Is it for search engines?

Yes. And Accessibility.

I think you might be confusing semantic HTML with the semantic web. (Which is understandable given the mention of semantic tags.)

Using semantic HTML means using <article> rather than yet another <div>. What GP is referring to, however, is adding extra information to your HTML detailing what kind of data is in your tags, e.g.:

    <p vocab="http://schema.org/" typeof="Person">
      <span property="name">Christopher Froome</span> was sponsored by
      <span property="sponsor" typeof="http://schema.org/Organization">
        <a property="url" href="http://www.skysports.com/">Sky</a></span> in the Tour de France.
    </p>
Here, the vocab, typeof and property attributes are used to add semantic information to the HTML. It might also give you an idea of why one might consider that a chore, especially if it doesn't appear to provide any benefit, like making your site accessible to users of screen readers.
You're right, I was conflating the two overlapping concepts.