Hacker News new | ask | show | jobs
by andrenotgiant 4408 days ago
Taking a step back: The "Page" paradigm is still very much alive, despite these recent javascript parsing advances.

1. Google still needs a URL-addressable "PAGE" to which it can send Users.

2. This "PAGE" needs to be find-able via LINKS (javascript or HTML) and it needs to exist within a sensible hierarchy of a SITE.

3. This "PAGE" needs to have unique and significant content visible immediately to the user, and on a single topic, and it needs to be sufficiently different from other pages on the site so as not to be discarded as duplicate content.

2 comments

I'd debate the phrase "step back". If you replace all your references to PAGE with URL, you get closer to a real meaning.

URLs for single-page applications are a serialization of application state. The fact that we now have an application platform (JavaScript/HTTP) providing sharable, mostly-human-readable state sharing (URLs) and is also indexed and searchable is nothing short of incredible.

Yes, the basic abstractions we use are the same. We will have URLs that address content in our applications. But now these are applications running on Google's own servers. Google is running my application (and hundreds of thousands more), and trying to understand what they mean to humans. This is a pretty amazing step forward.

Imagine Apple announcing it would run all iOS applications, interacting like a user to build a search index. IMO, this parallel shows what makes Google's commitment to running JavaScript apps exciting.

The point I was trying to make is this:

With every new capability from Googlebot comes new opportunities for us to screw it up as developers.

If we were to replace PAGE with URL, and URL is simply a serialization of application STATE, we could easily end up with infinite URLs that lead to STATES that are not really that different, unique or appealing as answers to queries users type into Google.

When deciding how to build Search-accessible Web Apps, and specifically what to expose to Google, we need to keep in mind that Google likes PAGES that follow the requirements I detailed above.

> these are applications running on Google's own servers. Google is running my application (and hundreds of thousands more)

Which is also very beneficial for Google as they'll likely be the only company doing that for a while, and the one able to do it for the most sites for a long time to come, maintaining Google's search index lead.

As I mentioned in the post, all these problems can be solved by using real paths/URLs and changing them dynamically using pushState.
But the onus is still on the developer to choose _what_ gets a unique URL and what does not.

It might be good for user deeplinking capabilities to change the URL every time any type of state change is made (for example sorting a list by date instead of name) - But exposing that many URLs to Google would be bad.

(This is the modern equivalent of the age-old "infinite calendar" problem that Googlebot had to deal with when dynamic calendar apps let you navigate to dates 2 millennia in the future.)

I agree with you; developers definitely have to think about the URLs they're exposing to Googlebot. But this is essentially no different from how things were before. Your example with sorting a list by date instead of name would be done with a query string (which Google does index to a point), e.g. /users?sortBy=date=&from=392. This can obviously create quite a lot of links to the same content, and developers should know how to handle this situation. Again, not different from before - single page apps don't change anything in this regard.