Hacker News new | ask | show | jobs
by endisneigh 1611 days ago
tldr: Google sometimes uses headings instead of titles. Match them to prevent title rewrite; stop using long, verbose titles
5 comments

From a pure HTTP perspective, isn't the point of page titles to be how a page is referenced? It would be an error if a library reported the title of "A Tale of Two Cities" as "It Was The Best of Times".

> stop using long, verbose titles

This is good advice, but if Google wants to penalize bad titles it should dock their rank, not misreport them.

> isn't the point of page titles to be how a page is referenced

It is, but what would you do if all titles across pages just said "ACME Corp."? That happens often if the developer just displays SITE_NAME in the title.

In those cases it makes sense to present the person searching the web with additional information from a H1 tag which probably has more information like "Contact us"

No thanks, as a Google user, I’m happy that Google is descriptive.

Ideally, Google tells me what the page actually contains. I.e. if you title the page “Top TVs of 2022” and you’re reviewing cars, then it titles it appropriately. Google can’t do that right now, but every step closer is a good thing for me.

There's lots of "isn't the point of..." in HTML that actual users have broken. Google (and other crawlers and intermediaries) have to adapt their algorithms to account for that.
> Google (and other crawlers and intermediaries) have to adapt their algorithms to account for that.

As I see it, Google's in a prime position to algorithmically reward actual users for better HTML discipline by ranking them above users who can't be bothered.

The Search Console is great at knowing if your site could use some improvements. They could easily[a] add a mark for “bad titles”.

[a]: “easily” because they already have logic to determine something needs rewriting

HTTP actually has nothing to do with page titles. I think web browsers should probably display the titles verbatim, but there may be use cases where they don't, a common one being where there isn't enough space so the title is truncated in the UI.

As for what search engines should do with page titles, it's really up to the individual search engine, I'd say. Whatever serves their users best.

As a search engine developer I totally get why. HTML in the wild is not well behaved in the slightest. People use title and heading tags in all manner of weird ways. I've seen <title>-tags in the <body>-tag used as headings. I've seen documents where every line was a <h1>-tag.

You kinda need to make the most of what you're given.

HTML5 has been around for long enough that we should be able to punish sites that use completely bonkers markup at this point right? Since Google effectively has historical archives of the internet they could pretty trivially grandfather in legitimately old content (things they tracked before some date) and just start down-ranking sites that continue to misbehave with markdown but skate by with browsers running in compatibility mode. Something like abusing <h1> tags is legal, if obnoxious, HTML and so it shouldn't really fall under this... but it's been long enough that we can start punishing completely incorrect syntax right?
That would be a massive loss, though. A lot of content isn't in HTML5, and a lot of that pre-HTML5 content is precious and valuable.

Google has sadly already tossed a lot of that by the wayside, since it often isn't served with HTTPS. I think something like 80% of the sites my crawler is aware of serve pages over plain HTTP.

In general, attempts at shaping the web through search engine indexing requirements seems to mostly serve to filter out content made by humans and select for search engine marketing.

Not so sure older content (like the stuff I wrote in the late 90s to mid 00s) would be negatively impacted, so long as search providers pay careful attention to the <!DOCTYPE> tag (or lack thereof). I wouldn't characterize holding people to at least a bare minimum of standards (e.g., title in the head and nowhere else, which has been the rule since at least HTML 2.0 in 1994) as "punishment", any more than dinging them for unclosed parens and other typos. Language is how we communicate understanding, and markup is how we frame presentations on the web (mostly). People need to be prepared for the consequences of making it up as they go along rather than educating themselves on the standard (whether spelling, grammar or markup language).
That really doesn't seem to be what I'm seeing, having built a search engine specialized in this type of content and finding almost nothing but gems in the refuse.

If anything, it seems like the single best predictor of whether a website is a content mill is strict adherence to modern web standards and other "google rules".

I think it'd be a pretty good to let in historical stuff on grace - and just start penalizing new content. Google absolutely has the tools to do this the right way and the internet archive could allow most other folks to accomplish the same thing.

Enabling HTTPS is easy on most platforms. Folks that have rolled their own platform or got unlucky and are using a CMS that fell out of favor do tend to get screwed over by this - but I think its fair to de-prioritize content that fails to adhere to good practices. The HTTP vs HTTPS debate in particular can be a real security concern - with tags its more about paying down the tech debt in our browser technology.

I really wish browsers would stop shrugging their shoulders at bad markup and display blank pages with errors in the consoles or even visible in the rendered page. It would force devs to clean up their act. But as long as 1 browser vendor doesn't do it, the end users will all just assume the strict browser is broken since there is another browser that does "work".
On the website of the company I work for, the title is "tagline | company name" but in the search results it shows up as "company name: tagline". That style doesn't appear on the company website anywhere.

I imagine it's Google trying to normalize how things are shown but it's quite annoying. It could potentially break some company's branding.

Shrug. The almost-religious belief in the necessity for ultra-consistent branding within some companies is nearly comical so long as you're on the outside.
Sadly agreed. Definitely not comical when I have multiple times had our marketing department blame/throw fits at the dev team for the site not showing up in Google's search results exactly how they want it to.
To be fair to the devs, that's an education gap. The response should be "You want us to develop a solution to a third party's whims? Maybe you should try writing them a nice letter about how their representation of our company affects our image; it'll have as much impact. Possibly more."

In real corporations, of course, that's not how it works because the tech people are "wizards" and Google is "part of the wizard stuff," but this isn't a technical problem (and maybe marketing needs to stop trying to control another company; that's no more likely to succeed than Coke yelling at Amazon that they don't always put Coke products at the top of every search result).

It's relatively minor for most businesses, but sometimes it isn't. Inconsistent messaging makes it a lot easier for someone to set up a phishing attack against your customers. My bank uses several different URLs, email sending addresses, and taglines for its services. It's not always easy to tell if an email is actually from the bank.

Google adding more permutations into the mix doesn't help.

Google changing the way title tags are formatted on their SERP is not the reason that your bank's customers are falling for phishing attacks.
Of course not. I didn't say it is. It's a whole bunch of things. Google changing things is one very small factor.

But it is a factor...

Related, but I dislike when I'm bookmarking a page and the title is one word - the name of the product or the company. It makes it hard to search for it later.
Specifically, make your title and H1 match exactly and aim for a character length of around 51-60.
Not everything needs a length of 51-60 characters. Instead of "Home" use "The is the starting page for this website" ;)