Hacker News new | ask | show | jobs
by crazygringo 4666 days ago
Question: is there any evidence that search engines actually use/respect semantic tags like <main>, <footer>, <article>, etc.? Or anyone here who knows firsthand from Google?

Just because, if I were writing a search engine, I would already have a bunch of "AI"/heuristics logic to tease these things out, since most sites don't use semantic HTML5 -- and it would probably do a solid job, since it's easy to compare a bunch of pages from a single site and figure out what parts are changing.

Then, if I actually started assuming that <main> or <article> was always the main/article part, it makes it easier for people to "game" the search engine with keyword-stuffing, etc. So, if I ran a search engine, I'd probably just ignore them completely and rely on my own heuristics.

(For example, Google completely ignores HTML language attributes: "Keep in mind that Google ignores all code-level language information, from “lang” attributes to Document Type Definitions (DTD). Some web editing programs create these attributes automatically, and therefore they aren’t very reliable when trying to determine the language of a webpage." [1] So I wouldn't be surprised if semantic HTML is the same deal.)

I've heard it endlessly repeated that semantic HTML helps SEO, and that's why you should use it. But I've never seen concrete evidence of this -- is there anything that actually backs it up?

[1] http://googlewebmastercentral.blogspot.com/2010/03/working-w...

4 comments

From what I understand, the short answer is "No, they don't". The long answer is "They don't yet, but if we keep telling people they do and everyone starts using semantic HTML, they will".
> and everyone starts using semantic HTML, they will

Eh, I doubt it. The problem with the "semantic web" is that your semantics might not match my semantics. You might use <article> only for the main content of an article-like page, whereas I might use it for each separate piece of text.

A few years ago I worked on a digital library system and we would sweat over including proper metadata about articles in meta tags (derived from the Dublin Core attributes entered by the authors). That was until we met with someone at Google who worked on Scholar. He said there was no harm in including that metadata, but they usually had more success inferring things like authors, title, etc from the HTML content of the page so they ignored it for the purposes of indexing. Things may have changed, but basically I think you're right that semantic markup is easy to game so probably doesn't affect SEO. Doesn't mean it's not worth doing though :)
Offtopic, but I'm not sure <article> means what you think it means. It could be an article like in a magazine, but according to the spec it's for any "self-contained composition in a document" like a widget or a comment. Each page could have many <article>s. It's definitely not intended to indicate "this is the main content of the page."

Lots of people are confused though. This is one of the many problems with semantic markup in the real world.

In this case, it's more about accessibility with WAI-ARIA than it is about SEO.