Hacker News new | ask | show | jobs
by gregable 2625 days ago
The browser displays the URL from the origin that digitally signed the unmodified content.

A browser already doesn't show you what server delivered the content. That would be your wifi AP, cell phone tower, or ISP node. The internet has already long established that we can trust content without trusting intermediaries.

There are two elements that are important: integrity and privacy. The content integrity is protected via a digital signature, the "signed" part of "signed http exchanges". The signature proves that the document hasn't been tampered with.

Regarding privacy: The intermediary (a search engine in this case) already has the content being delivered as a result of crawling it. It also knows the user clicked on a link to get that content, and knows the user's ip address. Even without AMP or Signed Exchanges, the privacy situation is the same. Once the page is loaded, all further interactions with the origin are normal https traffic, so later requests are not different in privacy either.

What this enables, for search results, is the ability to load the bytes of the content before the user clicks a search result. If the browser prefetched those bytes with the origin's awareness, then the user's privacy with respect to the search query would be violated, making prefetch problematic. With this setup, documents can be prefetched while preserving user privacy and after the user clicks all browser behavior continues as normal from that point forward.

5 comments

AMP allows Google to see exactly how you interact with every page on the internet.

Just from the text of the pages you visit they can build a profile around you. What your interests are, how much of an article you're likely to finish, whether you're the type of person to highlight text as you read, etc.

Unless you live on an island with a poor satellite connection AMP is useless as anything more than a corporate user data collection tool.

AMP documents don't share user data with Google, which can be trivially seen by inspecting the network events that the page generates.

If the publisher chooses, they can send logging to Google Analytics, but this is not part of AMP.

The typical argument otherwise is that the AMP javascript is loaded from Google's cache, however these javascript resources allow for a very long cache lifetime (1yr if the page came from the Google Cache), so relatively few page loads will actually end up fetching them from the network for most users.

Edit: These resources are also on cookieless domains.

> The typical argument otherwise is that the AMP javascript is loaded from Google's cache, however these javascript resources allow for a very long cache lifetime (1yr if the page came from the Google Cache), so relatively few page loads will actually end up fetching them from the network for most users.

Christ this is thin as a privacy argument.

> AMP documents don't share user data with Google, which can be trivially seen by inspecting the network events that the page generates.

Is there anything preventing Google from changing this later?

No, if Google can change the way web works from day one they can change anything they want. Don't forget Google is killing imap and dns already. Why not http to?
Also, Google explicitly states that it is collecting data in AMP Viewer [1]:

> The Google AMP Viewer is a hybrid environment where you can collect data about the user. Data collection by Google is governed by Google’s privacy policy.

I assume they collect information from HTTP request the browser sends when requesting an AMP page.

[1] https://developers.google.com/search/docs/guides/about-amp#a...

> AMP documents don’t share user data with Google

They might not now, but could ‘t Google start creating unique URLs on each page, allowing them to track you that way?

They can already do that, and are doing so, through Search, Analytics (maybe), ads, etc. That war is long lost.
They can't if you block all their shitty domains and don't use google services. Things that many privacy-conscious users do.
We are talking about their AMP cache. If you don't use Google Service, except if you like to prepends their amp cache URL before your links, you'll never get there.

Their AMP cache happens only on their search service. They already know which links you click... having an AMP cache on top doesn't give them MORE information than they already get. The use of that cache also make sure the website doesn't get more information because it's preloaded.

That's not entirely true though, is it ? any link shared on reddit, or here, on on any social network by a chrome user can be an amp one.
If (or when) the share of that privacy-conscious users will rise, Google might motivate webmasters to compile GA scripts in the main JS script, and considering pretty much any website now a days just doesn't show content with no Javascript enabled, it would be much harder to avoid.
I browse mostly without javascript on and that's not true; easily more than half of websites work just fine without it, and that number goes far up if you accept some lack of features. Though there are some that indeed don't work at all.

Although your point is well taken that there could be ways to sneakily track users eventually despite the aforementioned measures, and potentially even without javascript being required (though I doubt that share of privacy-concious users will ever raise significantly - most people simply don't care).

No excuse.
Google can't tell if a link has been clicked if JavaScript is off and the `ping` attribute is removed, so AMP removes privacy there.

By forcing web publishers to host their content on a Google cache, they lose their server-side logging and the ability to determine how they set up they way they serve their own sites.

Also, why do you artificially slow page loads on AMP pages to 8 seconds when JavaScript is disabled? That is a privacy issue.

The linker (google in this case) could rewrite the link to use a redirector if they choose. If Javascript is off, AMP and thus Signed Exchanges are disabled on Google search results anyway.

You misunderstand the 8 second CSS animation in the AMP boilerplate. Here's the code (simplified):

  <style>
    body { animation:-amp-start 8s steps(1,end) 0s 1 normal both}
    @keyframes -amp-start{from{visibility:hidden}to{visibility:visible}}
  </style>
  <noscript>
    <style amp-boilerplate>
      body{animation:none}
    </style>
  </noscript>
See the noscript section: if javascript is disabled, the CSS displays the body immediately. If Javascript is enabled, but for some reason the AMP javascript fails to load, after 8 seconds, the page is displayed anyway. The page is probably somewhat broken without the javascript loading, but the 8s is a fallback, not code to slow down non-javascript browsers.
There are legitimate (privacy/speed) reasons to not load AMP's JavaScript while still not turning of JavaScript entirely. Google does have the capability to know when you're on an AMP page, because the JS loads from ampproject.org, which is registered by Google.

An 8-second delay seems like an intentional "bug" to coerce users to turn on JavaScript (and advertising).

The javascript is heavily cached, so will not give a request on every page load.

That is not the intention. If javascript is disabled entirely, Google Search won't even load AMP pages. The scenario you describe of a user loading an AMP page directly without javascript enabled is somewhat rare.

Many people use tools to block third party JS from loading. AMP can't be called privacy-friendly while making it extremely difficult to use when tracking (AMP Analytics) is blocked. The 8-second delay happens to me every time I accidentally click an AMP URL in my browser.
I don't use Google Search, and I frequently get sent to Google's AMP cache via other link sources (e.g. HN).

I don't have javascript blocked, but I do have Google's tracking blocked via standard tracking protection (which is now a built-in feature in most non-Google browsers), which means <noscript> tags are not triggered, and I get the 8 second delay due to non-loading JS resources.

I don't think my setup is as rare as you make out.

It is clear that the current developments on the web are worrysome and we need real privacy. We need to be able to find a website and visit it completely anonymous, unless we actively submit information to said website or a court order is issued.

A cell phone tower or ISP node is ideally just infrastructure, "plumbing". Google seems to be trying to advance their strategic position in that direction. Rather than just being one search engine among several, they are trying to become part of the infrastructure. This could prevent future privacy solutions (and even prevent competitions between search engines).

The real reason to make this spec is not to improve integrity o privacy or something else but to make users stay on Google's domain instead of going to other site. Google wants to build its little walled garden, and this spec is needed to make users think that the walls aren't there.

> With this setup, documents can be prefetched while preserving user privacy and after the user clicks all browser behavior continues as normal from that point forward.

But Google can already preload and show cached version of the page without this spec. The only difference would be that address bar shows "google.com" instead of publisher's domain. There is no need for this specification.

> A browser already doesn't show you what server delivered the content. That would be your wifi AP, cell phone tower, or ISP node.

No. Incorrect. Completely backwards. Factually wrong. You just failed your networking-exam.

Those things you mentioned would be transparent networking nodes forwarding your TCP-packets and they have nothing to do with any layers above that.

The fact that you don’t even know this completely invalidates any other point you may have.

I think you're missing the point of the GP. It says that you don't know and you don't care which particular server returns your content - is it a self hosted machine, is it a cloud machine, is it a CDN? No way of knowing unless you inspect the deeper stack. What you see very visible is which BRAND (I. E. URL) returned your content.

So this Amp exhange technology changes nothing in this regard. It's like Google provides its own Free CDN, it is just not done in a traditional manner.

> It says that you don't know and you don't care which particular server returns your content

Which is plain wrong. I care.

When the URL-bar says I’m looking at company.com, I expect my browser to have used my OS’s DNS-resolver to look that name up, connect to the IP-given and nothing else.

I certainly don’t expect it to send traffic to certainly-not-the-nsa.com which are MITMing my traffic and tracking/monitoring it.

If I can’t trust my browsers URL-bar to exclusively and accurately reflect what is actually requested, it is effectively lying to me, the user, it’s owner.

And then suddenly all URLs are phishing URLs because Google made URLs no longer matter or mean anything.

Completely unacceptable.

My point is that even if you look at the URL bar currently and it says company.com, you don't know what you're connecting to. Probably you're connecting to CloudFlare/CloudFront/Akamai/Fastly/any other CDN which is set up with good-enough certs to impersonate the domain. Therefore you're not trusting a particular server, you're trusting a relationship that the domain owner built with her's service providers.

The proposed scheme is just another way to extend this kind of relationship that the publisher builds, a new mechanism if you will. There is nothing in there that requires more or less trust from your part than before.

You're complaining that need URLs to reflect what is requested - in fact, I argue that you want the URL to tell you what is being served. But this is not what's currently happening.

URLs are already lying to you.

I doubt that you WHOIS-lookup all DNS resolved-IPs to verify that the IP presenting a cert is assigned to the organisational entity that you want to connect to, and have a whitelist of those entities that you actually allow your browser to connect to. Because that's what currently required to make sure you don't go through CDNs and other intermediaries between you and the publisher.

Using a CDN currently means the company use trusted mechanisms like DNS to delegate certain traffic to other providers (like with Cloudflare). And it does so for everyone.

In which case the URL serves what was requested.

What AMP does is provide google.com content and lie to the user and says it comes from company.com.

Which isn’t true, and it only does so for users coming from google.com. Where I’m sure google will be happy for the additional tracking data.

This is NOT the url the user was lead to believe he requested. This is not what everyone else is served.

This is malware.