Hacker News new | ask | show | jobs
by tyingq 2392 days ago
It's not. It's using "Signed Exchanges", which Chrome supports, but most other browsers do not.

It's just AMP with some crypto that lets Google masquerade as your domain.

2 comments

Correction: It lets anyone cache your page, not just Google. And no "masquerading"; that's what the crypto is designed to prevent. Also it's not specific to AMP; you can use signed exchanges with any data served over HTTPS.
It's effectively just Google since it's not widely supported by browsers other than Chrome. There's also only one CA provider that can create the right certificate for SXG.

Or maybe you have some notable examples of SXG being used in a production non-AMP scenario?

The standard is brand new, and AMP was the motivating factor for its creation, so obviously the majority of existing use cases are AMP-related. That doesn't mean you couldn't go and implement a non-AMP use case in your own production site today.
One interesting use case for SXG is to allow decentralised and offline websites, since the site's data can be tied to a key/certificate/domain without having to be downloaded from a specific server. As an example, the IPFS project is already trialling the technology:

https://github.com/ipfs/in-web-browsers/issues/121

tying it to a domain name (which is the typical use of the URL) breaks the web though. i could understand if the key is used to show that the origin is a twitter account handle or something, but breaking the semantics of the domain by signing the content doesn't make any functional sense. Other than putting lipstick on a pig (AMP) of course
oh wow, Signed Exchanges are worse than AMP!

"make sure you are visiting mybanksite.com" is no longer safe.

> oh wow, Signed Exchanges are worse than AMP! > "make sure you are visiting mybanksite.com" is no longer safe.

Sounds like you don't trust public key based content signing. This is just broadening public key based signatures beyond the domain to include the domain and the content itself, and using signing to make the authenticity of the content independent of the physical infrastructure that served it.

That' what's being used here to verify authenticity of content's source, just like PGP/GPG does for signed emails.

That's a far stronger guarantee than "the data is authentic because it came IP address range X purchased by company Y".

In fact, without a such signature, there is no guarantee that just because a piece of content came from a particular server/datacenter, that it is authentic.

With signed exchanges, the chain of authenticity is pushed all the way back to the website's content creators - it doesn't stop at the web server. Also, this can't be phished unless you break the the content signing algorithms, and if that happens ... we all have bigger problems.

first, it breaks the URL specification, as the "host" is no longer a host. it breaks user's expectation of one of the VERY FEW things that everyday users understand about the internet.

one may manage to upload an html file to the bank's server and serve a -signed- page that google amp will cache, and then use it to phish customers from within the bank's domain. Or just use a stolen key to make thousands of such pages before the bank finds out. I think , contrary to what you say, it's a brand new, major attack surface.

> first, it breaks the URL specification, as the "host" is no longer a host.

By this definition, "host" hasn't been a host in a long time, since the time it was possible to route DNS traffic to multiple IP addresses, possibly in different datacenters.

> it breaks user's expectation of one of the VERY FEW things that everyday users understand about the internet.

How is signing content directly less authentic than signing only at the web server? Signing content directly at the time of publishing ensures that it was created using the private keys of the entity in question, regardless of the delivery mechanism for the content.

> one may manage to upload an html file to the bank's server and serve a -signed- page that google amp will cache,

Signed content exchanges specifically limit that by putting the content signing step at the content creator level, not the web server level. Unless you steal the content creator's private keys, you can't represent your content as theirs.

> "host" hasn't been a host in a long time,

Does SXG make this better or worse?

> ensures that it was created using the private keys

signing at the server ensures that it was created using the key AND served from a host they control. How is that not better?

> you can't represent your content

wouldn't the server sign all http responses by default? all you would need to do is upload a file

> wouldn't the server sign all http responses by default? all you would need to do is upload a file

No, the content has to be signed when it is created, in the content management system or similar content creation tool, not when the server sends it. The content management system itself must have strong controls on it (ACLs, controlled user accounts, protected private keys stored only on encrypted and access controlled media, regular audits, etc).

Basically the server itself is no longer trusted as the arbiter of content authenticity, the actual content creator is. Concretely, when the editor at a publication approves an article after reviewing it, it is signed for delivery at the moment of publication, not at the moment that the request is served.

> first, it breaks the URL specification, as the "host" is no longer a host.

Really, how so? RFC 3986 goes out of it's way to make clear that the "host" component doesn't mean DNS, and doesn't even have to denote a host.

"In other cases, the data within the host component identifies a registered name that has nothing to do with an Internet host."

"A URI resolution implementation might use DNS, host tables, yellow pages, NetInfo, WINS, or any other system for lookup of registered names."

> it breaks user's expectation of one of the FEW things that everyday users understand about the internet.

What, exactly and concretely, is that expectation?

> one may manage to upload an html file to the bank's server and serve a -signed- page that google amp will cache, and then use it to phish customers from within the bank's domain.

If the attacker can upload arbitrary pages to the bank's website, just why would they need signed exchanges? They've already got their phishing page on the correct domain.

> RFC 3986 goes out of it's way

the RFC uses the word "host" and not "signer". It also says that the "host" is intented to be looked up in some service registry, and there is no such thing for arbitrary signers.

> exactly and concretely, is that expectation

One of the common security advice banks used to give is "check your browser address that you are in our server"

> just why would they need signed exchanges

with signed exchanges they can fool amp to cache the page long after it has been deleted from the server

The RFC explicitly says that "host" doesn't necessarily mean an actual host and you still insist the opposite. So I don't really know what to say.

> One of the common security advice banks used to give is "check your browser address that you are in our server"

So you say that everyday users have an expectation that they're "in the bank's server"? That doesn't seem very concrete, since that could mean anything. Surely there is some kind of expectation they have about actual behavior or property. Something that will happen / can't happen right now, but the opposite with signed exchanges.

> Anyone who has the file can intercept the form data from that page now - a complete phishing attack.

Uhh... And just how would they do that? They can't inject anything into the page, and they can't modify the page. How do you figure they force the browser to submit the form to the wrong server?

> One of the common security advice banks used to give is "check your browser address that you are in our server"

" in our server" is a simplification of the technical explanation: "signed by our computers using our private keys before delivery to you". That is still maintained in the case of signed content exchange, but instead the transport function is provided by a different server.

It's not much different than, i.e. signing a compiled app with your private keys before uploading it to an app store. Such apps also use hosts to identify themselves and their content, even though they are delivered via app-store mechanisms.

> one may manage to upload an html file to the bank's server and serve a -signed- page that google amp will cache

Only if you have the bank's private key, and the ability to serve arbitrary content from the bank's domain. In which case... yeah, I don't see how the signed exchanges standard makes that problem significantly worse.

i don't know what's the max expiration for amp's cache, but i could set a really-long expiration date on the file and remove it from the server without the bank ever knowing it existed. SGX don't even require an upload - one disgruntled employee could do the same with a stolen key.

Nobody benefits from this shit than google. Do we really need more attack surfaces?

I hadn't realized the content was actually signed; I assumed we were simply trusting Google to send us the content they said they were sending (much like we do when using the Google cache). I'm curious now: would it be possible to use use the content/markup intended for use by the amp cache to view a static/unscripted/readable version of the page's main content? If so, why hasn't anyone built a browser extension to do so?

On a broader note, this also sounds like it could be used to allow caching proxies to work with https; you'd lose the privacy, but you'd gain from being able to cache content on local network if the browser only had to verify the content, and you trusted the cache not to spy on you.

> I'm curious now: would it be possible to use use the content/markup intended for use by the amp cache to view a static/unscripted/readable version of the page's main content? If so, why hasn't anyone built a browser extension to do so?

If the goal is to get around the AMP CDN, you don't even need to read the main page content. The AMP URL contains the original source URL itself [1].

The extension you are describing would just need to capture all requests with the prefix https://www.google.com/amp (or whatever CDN you are trying to get around), parse out the original URL, and then fetch it, and do what you will with it.

If the goal is to disable scripting on the AMP CDN delivered content, first note that AMP pages can't contain page-author-written JS [2], and any implicit JS has to run async.

But if that's insufficient, you can disable JS in the browser altogether, which would disable it in the loaded AMP content.

You could also try to parse out the main content from your extension from the AMP page if you know from the URL that it's an AMP page. Because AMP's forces relative terseness and simplicity of HTML content, it is probably easier to parse than original page's content. Obviously that won't generalize easily given the large variety of possible of content representations, but you stand a better chance of achieving this with AMP content than the original content.

And if you generalize it enough, you will end up with one component of a web crawl / indexing system in an extension ;)

1. https://blog.amp.dev/2017/02/06/whats-in-an-amp-url

2. https://amp.dev/about/how-amp-works/

I’m not sure you understand the purpose of https. Ensuring integrity of the document served by the server is only one small piece of it.

The other critical components are:

encryption so middleboxes can’t see what you’re looking at

guarantee (via the PKI) that the server you’re about to send your banking credentials to is using a cert that belongs to the domain name in the address bar that you trust sending your credentials to.

> encryption so middleboxes can’t see what you’re looking at > guarantee (via the PKI) that the server you’re about to send your banking credentials to is using a cert that belongs to the domain name in the address bar that you trust sending your credentials to.

The purpose of SXG is to allow publisher signing of edge-cache accelerated public content - i.e. it's read-only - not to encrypt private information like credentials in transport. Https still handles encrypted transport independently of SXG.

Also, why or how would someone create a system that accepted private info or credentials via signed SXG anyways? There's literally no mechanism in it to achieve that. If you tried to build a password entry field for your bank website and distributed it via SXG, it wouldn't even work in the first place.

> The purpose of SXG is to allow publisher signing of edge-cache accelerated public content

Is there a rule that SXG content can't contain forms or sth?

No, you can distribute whatever content you want. But the content distribution network can't listen for posts from those forms when the content is rendered.

SXG doesn't answer DNS requests for your domain. It only says that a particular piece of content has been signed using private keys that have been registered with the displayed host. That's it.

In fact, you don't even need a CDN or DNS to distribute SXG content. You could distribute it via USB drives, or code flags, USB drives attached to messenger pigeons, whatever. The point is that authenticity of the origin of the content is completely independent of how the content got to you.

When that SXG content, however it is distributed, is rendered, the browser represents that content as originating from your domain, which is in fact exactly where it originated.

I don't think there's a real phishing risk with them, but I object to Signed Exchanges because they are actively making the browser lie to me about the URL being used.
The URL the browser shows is the one which was cryptographically verified to be correct. I don't see how you can call that a "lie".

If I'm offline and I open an offline cached page in my browser, would you call it a lie if the browser displays the URL I originally downloaded that page from in the URL bar instead of saying it came from "your hard drive"?

It's not just us HN commenters that are concerned. Mozilla, for example, is highly opposed to it in it's current state.

"Mozilla has concerns about the shift in the web security model required for handling web-packaged information. Specifically, the ability for an origin to act on behalf of another without a client ever contacting the authoritative server is worrisome, as is the removal of a guarantee of confidentiality from the web security model (the host serving the web package has access to plain text). We recognise that the use cases satisfied by web packaging are useful, and would be likely to support an approach that enabled such use cases so long as the foregoing concerns could be addressed."

Mozilla has the proposal marked as "harmful".

Apple/Webkit have concerns as well: https://news.ycombinator.com/item?id=19679621

> We recognise that the use cases satisfied by web packaging are useful, and would be likely to support an approach that enabled such use cases[...]

That doesn't sound "highly opposed" to me.

Anyway, I read the full report from Mozilla back when they first published it, and while they do have some valid concerns (any new feature introduced to the web will necessarily introduce some new attack surfaces) I believe their concerns are already sufficiently well addressed by the standard.

The paragraph from Mozilla that you quoted is also rather vague and misleading. In particular:

> the ability for an origin to act on behalf of another without a client ever contacting the authoritative server is worrisome

This is super vague. I see no reason why that should be "worrisome". That sort of thing happens all the time in public key cryptography. When you receive a message signed with the private key of a trusted actor, it's perfectly reasonable to trust that the trusted actor authorized that message regardless of where the message itself came from. TLS itself already does exactly that every time you visit a website over HTTPS (your browser trusts certificates signed by a trusted CA, even though those certificates are being presented by an untrusted website, not the CA itself).

> as is the removal of a guarantee of confidentiality from the web security model

This concern is completely unfounded, and I'm surprised Mozilla included it in their summary. The use of the signed exchange standard doesn't reveal any information to any party that would not already have access to that information without the standard (a host serving you a link to a static, public page will necessarily already have access to the plaintext content of that page, regardless of whether they serve you that content themselves or not).

>That doesn't sound "highly opposed" to me

They marked the proposal as "harmful", and it remains marked that way.

I wasn't trying to exaggerate. I could cite other passages that support "highly opposed".

Mozilla did publish a pretty extensive document that explains their position and plans: https://docs.google.com/document/d/1ha00dSGKmjoEh2mRiG8FIA5s...

> I don't see how you can call that a "lie".

It's a lie because the URL being displayed does not reflect the source of the bits.

> If I'm offline and I open an offline cached page in my browser, would you call it a lie if the browser displays the URL I originally downloaded that page from

That's a bit of a gray area. Yes, it is a lie (the browser should provide an indication of the actual source of the bits). On the other hand, the cache was created by you and exists on your own machine, so it's more of a little white lie in that case.

What about something like Cloudflare, would you say they're lying when they return a cached file instead of contacting the origin server?
Yes, because Cloudflare isn't telling me that it's coming from them. However, that's already a lost battle.
How is it no longer safe?
Phishing
They can't alter the content - that's where the 'signed' part comes in. Any forms there would still go to the original source.