Hacker News new | ask | show | jobs
by greglindahl 4119 days ago
The client site, http://www.autoaccessoriesgarage.com, is engaging in cloaking.

Go to http://www.autoaccessoriesgarage.com/Seat-Covers/

Use the picker to pick a particular make and model: http://www.autoaccessoriesgarage.com/Seat-Covers/_Acura-RDX?...

So far so good, no problem. Your browser now has a cookie that says you're interested in just this make and model. Now for the problem: use the nav links to go to "Cargo trunk liners", and where do you land?

http://www.autoaccessoriesgarage.com/Cargo-Trunk-Liners

That's cloaking -- it's not showing you all of the liners, just the ones relevant to the make and model you picked earlier. Instead, the site should add _Acura-RDX?year=2008 to the url, just like before.

Why do search engines care about this stuff? Now imagine you type in [auto accessories cargo trunk liners] into your favorite search engine, and the result is http://www.autoaccessoriesgarage.com/Cargo-Trunk-Liners ... what does the search engine think you'll see? It has no idea, really.

3 comments

Google disagree with your assessment:

https://productforums.google.com/forum/#!searchin/en/cookie$... (see the response is marked best answer by Matt Cutts - head of web spam at Google).

If I have never been to the site I'd land on the unfiltered page that would be a good result, and if I had a cookie (which seems to be a session cookie from a quick look) then it is likely I was recently at the site and so the filters are likely relevant but if not they are easy to change.

'Cloaking' has negative connotations and is more of a concern when there is an attempt to mislead search engine. In this instance, there is a big problem with your suggested fix -- the Panda algorithm would see many very similar pages which might actually make things worse (which I agree is silly, as your solution would otherwise have some upsides, but there is often a trade off in these situations).

That's a simplistic way of thinking about the problem -- as a search engine professional (not SEO), I'd never recommend something that depends on GoogleBot figuring out that I'm not really cloaking.

The duplicate content problem you describe is fixable (edit: and is already a problem, I'm only recommending changing links, not adding any pages to the site.)

And by the way, there are plenty of websites that force crawlers to use cookies in order to crawl the site. I don't know how GoogleBot deals with that, but I bet it involves crawling with cookies... no matter what the forum post says.

Yeah - I don't disagree that there isn't possibly some level of risk. But if your concern is "GoogleBot figuring out that I'm not really cloaking" based on the presence of cookies then I'd challenge (what I think is) your implication that having cookies on your site means Googlebot might suspect you of crawling.

As to Googlebot's use of cookies - there is debate and folklore, but in the tests I have run I have not seen Googlebot ever send back a cookie that I have sent it.

Google do manual reviews of pages, and I am confident the site in this example (for the case in question, at least) would pass that without a problem.

I'm (genuinely) interested in your proposed solution for dealing with the duplicate content problem. The problem with the Panda algorithm is tends to be a bit touchy and it seems easy to fall foul of it even with innocent situations like this one.

That's not my implication, nor what I said! I said that this website should choose a link method which is unambiguiously not cloaking. Then there's no chance that you'll confuse search engine bots.

The duplicate content issue is not in play for my suggestion; as my edit above states, I'm only recommending changing links, not creating any new urls.

Exactly +1
That's sort of a problem. Also, look at these pages:

http://www.autoaccessoriesgarage.com/Seat-Covers/_Nissan-Alt... http://www.autoaccessoriesgarage.com/Seat-Covers/_Hyundai-So...

The page looks identical, and if you think you're fooling Google into thinking these pages are very different with a couple keyword-stuffed paragraphs of text, think again. Now open both in separate windows and click a product. The product page itself doesn't change except for the vehicle name inserted with a cookie. This looks like you're just mass-generating category and product pages dynamically, which is probably what you're doing.

Don't get me wrong, I feel your pain, and funny enough I've solved this EXACT SAME problem on a similar car accessory site. Maybe I can offer some advice.

Your main Panda problem is that you have a page for every type of product for every make and model. That's a LOT of nearly-identical pages. You need to consolidate them somehow. Easier said than done, right? You don't sell all products for all vehicles, and you want users to have an organic landing page when they search for something like "[make] [model] [accessory].

Instead of generating these landing pages and making up text, I'd use a filter on your car covers page that sticks the user with a URL variable that stays with them until they change their make/model. This also frees you of the need to make up pointless mass-generated paragraphs.

This truly is frustrating, because the site is actually functioning in a way that makes sense for the user, and Google is penalizing them for it.

Excellent point Greg, the article does address this though: check the "U/X vs. Googlebot/X" section
Right, I've added in the right word (cloaking) and a much better fix. Adjusting cookie policy for crawlers is a bad idea.