Hacker News new | ask | show | jobs
by Kique 2381 days ago
I love AMP sites that do it the right way, like Politico. Keeps the real domain, loads fast, clean interface. I wish more sites were like this. I think the first version of AMP where the URL was always "google.com/amp/politico/sdgffsdf" was awful but you can now keep the correct domain and I sometimes prefer it to the regular version of a lot of sites.

https://www.politico.com/amp/news/2019/12/04/trump-impeachme...

4 comments

It's nicer than the original AMP setup, but still awful for publishers.

For any user that navigates to your AMP page from a Google search...

The publisher gives up the most important piece of screen real estate, and Google highjacks left/right swipes to navigate to your competitors. And, they hijack the back button post swipe too...back equals "back to Google"...not back to the page I swiped from.

It is pretty much like early AOL. A semi walled garden. It offers some speed benefit for users, but way more benefit to Google.

What's the most important piece of screen real estate that they're giving up?
The top. Top left is red hot on any heatmap that tracks eyeball movement. Google controls what goes there.
The page you linked takes 8s to display on my browser, even on subsequent reloads, just because I don't allow third-party scripts. It also has no displayed images, for the same reason. I really don't wish more sites were like this.
> I think the first version of AMP where the URL was always "google.com/amp/politico/sdgffsdf" was awful

But that has the advantage of making it easier to find the real page rather than the AMP page.

is this served from politico's servers and how is it different from a stripped down version of their site?
It's not. It's using "Signed Exchanges", which Chrome supports, but most other browsers do not.

It's just AMP with some crypto that lets Google masquerade as your domain.

Correction: It lets anyone cache your page, not just Google. And no "masquerading"; that's what the crypto is designed to prevent. Also it's not specific to AMP; you can use signed exchanges with any data served over HTTPS.
It's effectively just Google since it's not widely supported by browsers other than Chrome. There's also only one CA provider that can create the right certificate for SXG.

Or maybe you have some notable examples of SXG being used in a production non-AMP scenario?

The standard is brand new, and AMP was the motivating factor for its creation, so obviously the majority of existing use cases are AMP-related. That doesn't mean you couldn't go and implement a non-AMP use case in your own production site today.
One interesting use case for SXG is to allow decentralised and offline websites, since the site's data can be tied to a key/certificate/domain without having to be downloaded from a specific server. As an example, the IPFS project is already trialling the technology:

https://github.com/ipfs/in-web-browsers/issues/121

oh wow, Signed Exchanges are worse than AMP!

"make sure you are visiting mybanksite.com" is no longer safe.

> oh wow, Signed Exchanges are worse than AMP! > "make sure you are visiting mybanksite.com" is no longer safe.

Sounds like you don't trust public key based content signing. This is just broadening public key based signatures beyond the domain to include the domain and the content itself, and using signing to make the authenticity of the content independent of the physical infrastructure that served it.

That' what's being used here to verify authenticity of content's source, just like PGP/GPG does for signed emails.

That's a far stronger guarantee than "the data is authentic because it came IP address range X purchased by company Y".

In fact, without a such signature, there is no guarantee that just because a piece of content came from a particular server/datacenter, that it is authentic.

With signed exchanges, the chain of authenticity is pushed all the way back to the website's content creators - it doesn't stop at the web server. Also, this can't be phished unless you break the the content signing algorithms, and if that happens ... we all have bigger problems.

first, it breaks the URL specification, as the "host" is no longer a host. it breaks user's expectation of one of the VERY FEW things that everyday users understand about the internet.

one may manage to upload an html file to the bank's server and serve a -signed- page that google amp will cache, and then use it to phish customers from within the bank's domain. Or just use a stolen key to make thousands of such pages before the bank finds out. I think , contrary to what you say, it's a brand new, major attack surface.

> first, it breaks the URL specification, as the "host" is no longer a host.

By this definition, "host" hasn't been a host in a long time, since the time it was possible to route DNS traffic to multiple IP addresses, possibly in different datacenters.

> it breaks user's expectation of one of the VERY FEW things that everyday users understand about the internet.

How is signing content directly less authentic than signing only at the web server? Signing content directly at the time of publishing ensures that it was created using the private keys of the entity in question, regardless of the delivery mechanism for the content.

> one may manage to upload an html file to the bank's server and serve a -signed- page that google amp will cache,

Signed content exchanges specifically limit that by putting the content signing step at the content creator level, not the web server level. Unless you steal the content creator's private keys, you can't represent your content as theirs.

> first, it breaks the URL specification, as the "host" is no longer a host.

Really, how so? RFC 3986 goes out of it's way to make clear that the "host" component doesn't mean DNS, and doesn't even have to denote a host.

"In other cases, the data within the host component identifies a registered name that has nothing to do with an Internet host."

"A URI resolution implementation might use DNS, host tables, yellow pages, NetInfo, WINS, or any other system for lookup of registered names."

> it breaks user's expectation of one of the FEW things that everyday users understand about the internet.

What, exactly and concretely, is that expectation?

> one may manage to upload an html file to the bank's server and serve a -signed- page that google amp will cache, and then use it to phish customers from within the bank's domain.

If the attacker can upload arbitrary pages to the bank's website, just why would they need signed exchanges? They've already got their phishing page on the correct domain.

> one may manage to upload an html file to the bank's server and serve a -signed- page that google amp will cache

Only if you have the bank's private key, and the ability to serve arbitrary content from the bank's domain. In which case... yeah, I don't see how the signed exchanges standard makes that problem significantly worse.

I hadn't realized the content was actually signed; I assumed we were simply trusting Google to send us the content they said they were sending (much like we do when using the Google cache). I'm curious now: would it be possible to use use the content/markup intended for use by the amp cache to view a static/unscripted/readable version of the page's main content? If so, why hasn't anyone built a browser extension to do so?

On a broader note, this also sounds like it could be used to allow caching proxies to work with https; you'd lose the privacy, but you'd gain from being able to cache content on local network if the browser only had to verify the content, and you trusted the cache not to spy on you.

> I'm curious now: would it be possible to use use the content/markup intended for use by the amp cache to view a static/unscripted/readable version of the page's main content? If so, why hasn't anyone built a browser extension to do so?

If the goal is to get around the AMP CDN, you don't even need to read the main page content. The AMP URL contains the original source URL itself [1].

The extension you are describing would just need to capture all requests with the prefix https://www.google.com/amp (or whatever CDN you are trying to get around), parse out the original URL, and then fetch it, and do what you will with it.

If the goal is to disable scripting on the AMP CDN delivered content, first note that AMP pages can't contain page-author-written JS [2], and any implicit JS has to run async.

But if that's insufficient, you can disable JS in the browser altogether, which would disable it in the loaded AMP content.

You could also try to parse out the main content from your extension from the AMP page if you know from the URL that it's an AMP page. Because AMP's forces relative terseness and simplicity of HTML content, it is probably easier to parse than original page's content. Obviously that won't generalize easily given the large variety of possible of content representations, but you stand a better chance of achieving this with AMP content than the original content.

And if you generalize it enough, you will end up with one component of a web crawl / indexing system in an extension ;)

1. https://blog.amp.dev/2017/02/06/whats-in-an-amp-url

2. https://amp.dev/about/how-amp-works/

I’m not sure you understand the purpose of https. Ensuring integrity of the document served by the server is only one small piece of it.

The other critical components are:

encryption so middleboxes can’t see what you’re looking at

guarantee (via the PKI) that the server you’re about to send your banking credentials to is using a cert that belongs to the domain name in the address bar that you trust sending your credentials to.

> encryption so middleboxes can’t see what you’re looking at > guarantee (via the PKI) that the server you’re about to send your banking credentials to is using a cert that belongs to the domain name in the address bar that you trust sending your credentials to.

The purpose of SXG is to allow publisher signing of edge-cache accelerated public content - i.e. it's read-only - not to encrypt private information like credentials in transport. Https still handles encrypted transport independently of SXG.

Also, why or how would someone create a system that accepted private info or credentials via signed SXG anyways? There's literally no mechanism in it to achieve that. If you tried to build a password entry field for your bank website and distributed it via SXG, it wouldn't even work in the first place.

I don't think there's a real phishing risk with them, but I object to Signed Exchanges because they are actively making the browser lie to me about the URL being used.
The URL the browser shows is the one which was cryptographically verified to be correct. I don't see how you can call that a "lie".

If I'm offline and I open an offline cached page in my browser, would you call it a lie if the browser displays the URL I originally downloaded that page from in the URL bar instead of saying it came from "your hard drive"?

It's not just us HN commenters that are concerned. Mozilla, for example, is highly opposed to it in it's current state.

"Mozilla has concerns about the shift in the web security model required for handling web-packaged information. Specifically, the ability for an origin to act on behalf of another without a client ever contacting the authoritative server is worrisome, as is the removal of a guarantee of confidentiality from the web security model (the host serving the web package has access to plain text). We recognise that the use cases satisfied by web packaging are useful, and would be likely to support an approach that enabled such use cases so long as the foregoing concerns could be addressed."

Mozilla has the proposal marked as "harmful".

Apple/Webkit have concerns as well: https://news.ycombinator.com/item?id=19679621

> I don't see how you can call that a "lie".

It's a lie because the URL being displayed does not reflect the source of the bits.

> If I'm offline and I open an offline cached page in my browser, would you call it a lie if the browser displays the URL I originally downloaded that page from

That's a bit of a gray area. Yes, it is a lie (the browser should provide an indication of the actual source of the bits). On the other hand, the cache was created by you and exists on your own machine, so it's more of a little white lie in that case.

How is it no longer safe?
Phishing
They can't alter the content - that's where the 'signed' part comes in. Any forms there would still go to the original source.
I believe it‘s done with via signed exchange. You are free to host it where ever i think.

https://amp.dev/documentation/guides-and-tutorials/optimize-...