Hacker News new | ask | show | jobs
by jacquesm 4406 days ago
One potential problem here is that google will use this to widen the gap between it and the 'one page apps' web and other search engines (such as duckduckgo) that can't match it in resources.

How strong of an advantage that will be in the long run is uncertain, I would rather see a web that ships pages with actual content in them than empty containers for a variety of reasons (most of which have to do with accessibility and the fact that not all clients are browsers or even capable of running javascript).

This 'new web' is going off in a direction that is harmful, coupled with the mobile app walled gardens it is turning back the clock in a hurry.

I'm fairly sure this is not the web that Tim Berners-Lee envisioned.

10 comments

I agree 100%, and see other problems for users as well. The worst web pages I encounter are the javascript-constructed DOM pages. And by far the worst offender is the odious Google Web Toolkit.

It may be that I encounter some of these "modern" pages without knowing it because some dev has put in the time to make it work well, it seems that the vast majority are absolutely terrible. It's a small decrease in developer effort for a huge decrease in user satisfaction. I remember the terrible terrible #! days of Twitter with sadness.

Some developers have a tendency to go for the internally sophisticated/beautiful in preference to the best experience for the user. I hope that blog posts like this one don't let loose these developers' worst tendencies.

I don't see this as a problem at all, here is why:

Google doesn't really have competitors in search, they just don't. I mean look at how we even define 'searching the internet' in our spoken language: 'google it'. Google has become the identity of search on the web in most people's minds. And to top it off, they are really, really good at it. Google getting better is going to widen the gap between them and everyone else, but the gap is already pretty damn wide. Was there really any chance of someone catching them in any foreseeable future?

But, the wider that gap gets the more motivation there becomes to not attack the gap, but to go in a different direction altogether. Nobody talks about duckduckgo because they're search results are better than google's, they talk about them because duckduckgo is all about your privacy. They found a different way to make a search that people want to use. The wider that gap gets the more motivated some will be to try something truly novel to compete with google.

> Nobody talks about duckduckgo because they're search results are better than google's, they talk about them because duckduckgo is all about your privacy.

'Nobody' is such a strong word.

http://devblog.avdi.org/2014/02/16/why-duckduckgo-is-better-...

>I'm fairly sure this is not the web that Tim Berners-Lee envisioned.

So? None of these are convincing arguments for application developers. Having to rewrite an application to perform all its UI logic on the server side in addition to client side is a lot of work, for almost no benefit to the people paying to make the application.

As a user, so long URLs work so I can send locations to other people, then most of my accessibility scenarios are solved.

The rest is simply a lack of technology in other clients.

The problem is not about some God like commandment. It is about the original design of the Web, which we all believe was the reason for its success.

When you receive a GET request for an URL, and the browser tells you it accepts text/html, it is expected that you answer with the content stored at that URL in the format requested. It is not expected that you answer with an application that when run will eventually produce the content.

The correct way to do what this post is saying is to create a new mime type for this content delivery method. Then, if the browser actively tells you it accepts that content type, deliver it.

What the OP proposes is not text/html. It's something else.

I'm fairly sure the web has not been the web that Tim Berners-Lee envisioned for a long time now.
.... and that's a very good thing per se. The idea that the development of something as important and universal to the web should be limited to what one man (no matter how visionary) was able to envision a couple of decades ago is beyond bizarre.
It's not difficult to set up middleware that'll render the page for any clients that require it. (For instance, we can assume any client that identifies as "bot" that's not Google probably wants a pre-rendered page, which we can do quite effortlessly. Here's one implementation for Nodejs: https://prerender.io, or you can always roll your own with something like Phantom.js.
Note that sending a different response to googlebot than what you send to normal users is a violation of Google's guidelines and can get your site penalized. Use at your own peril.
Not necessarily. This concept of HTML snapshots is actually suggested by Google as a solution.

See this: https://developers.google.com/webmasters/ajax-crawling/docs/...

We comply with Google's guidelines thing (https://snapsearch.io/documentation)

No where in that article does it say it's ok to only serve the snapshot to googlebot. Serving different content to googlebot than what you serve to users is called cloaking and is against their guidelines: https://support.google.com/webmasters/answer/66355

I've invested a significant amount of time in this topic and would love if you were right, but I've never seen the money quote that it's ok to do this. In fact, everything I've read says that you have to treat search bots the same as you treat normal users.

The title of the article says "How do I create an HTML snapshot?"

The FAQ (https://developers.google.com/webmasters/ajax-crawling/docs/...) goes into more detail regarding the concept of _escaped_fragment_ which is used in so that your server can respond with a static snapshot instead of javascript.

Look at the diagram at the bottom of this page (https://developers.google.com/webmasters/ajax-crawling/docs/...)

It explicitly says "snapshots"

wow this is amazing. would love to see an offshoot of this where it could render a sitemap, or even keep a live sitemap up to date via cron.d or something (just hoping out loud)
You can use SEO4Ajax [1] to crawl your SPA and generate an up to date sitemap dynamically.

[1] http://www.seo4ajax.com/

We're using Brombone. You give them your sitemap, they do the prerendering and save it all. Then you proxy to them for Googlebot and others.
Dynamic constructions of sitemaps is surprisingly difficult. We would need to poll every single page you have, just to check if you potentially have a new link to a new page in your site. And everytime you add a new page, that's a whole another page to scrape and analyse.
At the head of the index it probably won't make much difference, big sites will render the initial page on the server, it's the long tail where it will be client only. It may make scaling a website easier though as frameworks will start making adding or moving rendering from client to server trivial. So the path from the long tail to the head will be easier to navigate. (if you'll excuse me mixing my metaphors)
duckduckgo is mainly a meta search engine (relies on search API of Yahoo that relies itself on Bing, Yandex, etc.). Plus it shows some related snippets from Wikipedia and other data-sources.

Several well known web search engines are now defunct or switched to meta-search business (like Yahoo with Bing data).

There are only a few international/world-wide search engines with a crawler:

Google, Bing, Yandex, Baidu, Gigablast, (Archive.org/Wayback Machine)

Berners-Lee envisioned a decentralised peer-to-peer information sharing network where everyone was a server and a client.
One page apps that aren't crawlable don't want to be crawled, don't make the necessary work, or are simply incompetent. Making an ajax site crawlable isn't exactly rocket science.

The gap this will really widen is the one between sites that do the necessary work themselves, and those who don't.

One potential problem here is that google will use this to widen the gap between it and the 'one page apps' web and other search engines (such as duckduckgo) that can't match it in resources.

There are free and open source tools available that would help search engines parse pages containing JS (PhantomJS comes to mind).

It's not just tools, it's the cost of all that parsing and executing in a mock browser environment.