Hacker News new | ask | show | jobs
by buboard 2382 days ago
is this served from politico's servers and how is it different from a stripped down version of their site?
2 comments

It's not. It's using "Signed Exchanges", which Chrome supports, but most other browsers do not.

It's just AMP with some crypto that lets Google masquerade as your domain.

Correction: It lets anyone cache your page, not just Google. And no "masquerading"; that's what the crypto is designed to prevent. Also it's not specific to AMP; you can use signed exchanges with any data served over HTTPS.
It's effectively just Google since it's not widely supported by browsers other than Chrome. There's also only one CA provider that can create the right certificate for SXG.

Or maybe you have some notable examples of SXG being used in a production non-AMP scenario?

The standard is brand new, and AMP was the motivating factor for its creation, so obviously the majority of existing use cases are AMP-related. That doesn't mean you couldn't go and implement a non-AMP use case in your own production site today.
One interesting use case for SXG is to allow decentralised and offline websites, since the site's data can be tied to a key/certificate/domain without having to be downloaded from a specific server. As an example, the IPFS project is already trialling the technology:

https://github.com/ipfs/in-web-browsers/issues/121

tying it to a domain name (which is the typical use of the URL) breaks the web though. i could understand if the key is used to show that the origin is a twitter account handle or something, but breaking the semantics of the domain by signing the content doesn't make any functional sense. Other than putting lipstick on a pig (AMP) of course
oh wow, Signed Exchanges are worse than AMP!

"make sure you are visiting mybanksite.com" is no longer safe.

> oh wow, Signed Exchanges are worse than AMP! > "make sure you are visiting mybanksite.com" is no longer safe.

Sounds like you don't trust public key based content signing. This is just broadening public key based signatures beyond the domain to include the domain and the content itself, and using signing to make the authenticity of the content independent of the physical infrastructure that served it.

That' what's being used here to verify authenticity of content's source, just like PGP/GPG does for signed emails.

That's a far stronger guarantee than "the data is authentic because it came IP address range X purchased by company Y".

In fact, without a such signature, there is no guarantee that just because a piece of content came from a particular server/datacenter, that it is authentic.

With signed exchanges, the chain of authenticity is pushed all the way back to the website's content creators - it doesn't stop at the web server. Also, this can't be phished unless you break the the content signing algorithms, and if that happens ... we all have bigger problems.

first, it breaks the URL specification, as the "host" is no longer a host. it breaks user's expectation of one of the VERY FEW things that everyday users understand about the internet.

one may manage to upload an html file to the bank's server and serve a -signed- page that google amp will cache, and then use it to phish customers from within the bank's domain. Or just use a stolen key to make thousands of such pages before the bank finds out. I think , contrary to what you say, it's a brand new, major attack surface.

> first, it breaks the URL specification, as the "host" is no longer a host.

By this definition, "host" hasn't been a host in a long time, since the time it was possible to route DNS traffic to multiple IP addresses, possibly in different datacenters.

> it breaks user's expectation of one of the VERY FEW things that everyday users understand about the internet.

How is signing content directly less authentic than signing only at the web server? Signing content directly at the time of publishing ensures that it was created using the private keys of the entity in question, regardless of the delivery mechanism for the content.

> one may manage to upload an html file to the bank's server and serve a -signed- page that google amp will cache,

Signed content exchanges specifically limit that by putting the content signing step at the content creator level, not the web server level. Unless you steal the content creator's private keys, you can't represent your content as theirs.

> "host" hasn't been a host in a long time,

Does SXG make this better or worse?

> ensures that it was created using the private keys

signing at the server ensures that it was created using the key AND served from a host they control. How is that not better?

> you can't represent your content

wouldn't the server sign all http responses by default? all you would need to do is upload a file

> first, it breaks the URL specification, as the "host" is no longer a host.

Really, how so? RFC 3986 goes out of it's way to make clear that the "host" component doesn't mean DNS, and doesn't even have to denote a host.

"In other cases, the data within the host component identifies a registered name that has nothing to do with an Internet host."

"A URI resolution implementation might use DNS, host tables, yellow pages, NetInfo, WINS, or any other system for lookup of registered names."

> it breaks user's expectation of one of the FEW things that everyday users understand about the internet.

What, exactly and concretely, is that expectation?

> one may manage to upload an html file to the bank's server and serve a -signed- page that google amp will cache, and then use it to phish customers from within the bank's domain.

If the attacker can upload arbitrary pages to the bank's website, just why would they need signed exchanges? They've already got their phishing page on the correct domain.

> RFC 3986 goes out of it's way

the RFC uses the word "host" and not "signer". It also says that the "host" is intented to be looked up in some service registry, and there is no such thing for arbitrary signers.

> exactly and concretely, is that expectation

One of the common security advice banks used to give is "check your browser address that you are in our server"

> just why would they need signed exchanges

with signed exchanges they can fool amp to cache the page long after it has been deleted from the server

> one may manage to upload an html file to the bank's server and serve a -signed- page that google amp will cache

Only if you have the bank's private key, and the ability to serve arbitrary content from the bank's domain. In which case... yeah, I don't see how the signed exchanges standard makes that problem significantly worse.

i don't know what's the max expiration for amp's cache, but i could set a really-long expiration date on the file and remove it from the server without the bank ever knowing it existed. SGX don't even require an upload - one disgruntled employee could do the same with a stolen key.

Nobody benefits from this shit than google. Do we really need more attack surfaces?

I hadn't realized the content was actually signed; I assumed we were simply trusting Google to send us the content they said they were sending (much like we do when using the Google cache). I'm curious now: would it be possible to use use the content/markup intended for use by the amp cache to view a static/unscripted/readable version of the page's main content? If so, why hasn't anyone built a browser extension to do so?

On a broader note, this also sounds like it could be used to allow caching proxies to work with https; you'd lose the privacy, but you'd gain from being able to cache content on local network if the browser only had to verify the content, and you trusted the cache not to spy on you.

> I'm curious now: would it be possible to use use the content/markup intended for use by the amp cache to view a static/unscripted/readable version of the page's main content? If so, why hasn't anyone built a browser extension to do so?

If the goal is to get around the AMP CDN, you don't even need to read the main page content. The AMP URL contains the original source URL itself [1].

The extension you are describing would just need to capture all requests with the prefix https://www.google.com/amp (or whatever CDN you are trying to get around), parse out the original URL, and then fetch it, and do what you will with it.

If the goal is to disable scripting on the AMP CDN delivered content, first note that AMP pages can't contain page-author-written JS [2], and any implicit JS has to run async.

But if that's insufficient, you can disable JS in the browser altogether, which would disable it in the loaded AMP content.

You could also try to parse out the main content from your extension from the AMP page if you know from the URL that it's an AMP page. Because AMP's forces relative terseness and simplicity of HTML content, it is probably easier to parse than original page's content. Obviously that won't generalize easily given the large variety of possible of content representations, but you stand a better chance of achieving this with AMP content than the original content.

And if you generalize it enough, you will end up with one component of a web crawl / indexing system in an extension ;)

1. https://blog.amp.dev/2017/02/06/whats-in-an-amp-url

2. https://amp.dev/about/how-amp-works/

I’m not sure you understand the purpose of https. Ensuring integrity of the document served by the server is only one small piece of it.

The other critical components are:

encryption so middleboxes can’t see what you’re looking at

guarantee (via the PKI) that the server you’re about to send your banking credentials to is using a cert that belongs to the domain name in the address bar that you trust sending your credentials to.

> encryption so middleboxes can’t see what you’re looking at > guarantee (via the PKI) that the server you’re about to send your banking credentials to is using a cert that belongs to the domain name in the address bar that you trust sending your credentials to.

The purpose of SXG is to allow publisher signing of edge-cache accelerated public content - i.e. it's read-only - not to encrypt private information like credentials in transport. Https still handles encrypted transport independently of SXG.

Also, why or how would someone create a system that accepted private info or credentials via signed SXG anyways? There's literally no mechanism in it to achieve that. If you tried to build a password entry field for your bank website and distributed it via SXG, it wouldn't even work in the first place.

> The purpose of SXG is to allow publisher signing of edge-cache accelerated public content

Is there a rule that SXG content can't contain forms or sth?

I don't think there's a real phishing risk with them, but I object to Signed Exchanges because they are actively making the browser lie to me about the URL being used.
The URL the browser shows is the one which was cryptographically verified to be correct. I don't see how you can call that a "lie".

If I'm offline and I open an offline cached page in my browser, would you call it a lie if the browser displays the URL I originally downloaded that page from in the URL bar instead of saying it came from "your hard drive"?

It's not just us HN commenters that are concerned. Mozilla, for example, is highly opposed to it in it's current state.

"Mozilla has concerns about the shift in the web security model required for handling web-packaged information. Specifically, the ability for an origin to act on behalf of another without a client ever contacting the authoritative server is worrisome, as is the removal of a guarantee of confidentiality from the web security model (the host serving the web package has access to plain text). We recognise that the use cases satisfied by web packaging are useful, and would be likely to support an approach that enabled such use cases so long as the foregoing concerns could be addressed."

Mozilla has the proposal marked as "harmful".

Apple/Webkit have concerns as well: https://news.ycombinator.com/item?id=19679621

> We recognise that the use cases satisfied by web packaging are useful, and would be likely to support an approach that enabled such use cases[...]

That doesn't sound "highly opposed" to me.

Anyway, I read the full report from Mozilla back when they first published it, and while they do have some valid concerns (any new feature introduced to the web will necessarily introduce some new attack surfaces) I believe their concerns are already sufficiently well addressed by the standard.

The paragraph from Mozilla that you quoted is also rather vague and misleading. In particular:

> the ability for an origin to act on behalf of another without a client ever contacting the authoritative server is worrisome

This is super vague. I see no reason why that should be "worrisome". That sort of thing happens all the time in public key cryptography. When you receive a message signed with the private key of a trusted actor, it's perfectly reasonable to trust that the trusted actor authorized that message regardless of where the message itself came from. TLS itself already does exactly that every time you visit a website over HTTPS (your browser trusts certificates signed by a trusted CA, even though those certificates are being presented by an untrusted website, not the CA itself).

> as is the removal of a guarantee of confidentiality from the web security model

This concern is completely unfounded, and I'm surprised Mozilla included it in their summary. The use of the signed exchange standard doesn't reveal any information to any party that would not already have access to that information without the standard (a host serving you a link to a static, public page will necessarily already have access to the plaintext content of that page, regardless of whether they serve you that content themselves or not).

> I don't see how you can call that a "lie".

It's a lie because the URL being displayed does not reflect the source of the bits.

> If I'm offline and I open an offline cached page in my browser, would you call it a lie if the browser displays the URL I originally downloaded that page from

That's a bit of a gray area. Yes, it is a lie (the browser should provide an indication of the actual source of the bits). On the other hand, the cache was created by you and exists on your own machine, so it's more of a little white lie in that case.

What about something like Cloudflare, would you say they're lying when they return a cached file instead of contacting the origin server?
How is it no longer safe?
Phishing
They can't alter the content - that's where the 'signed' part comes in. Any forms there would still go to the original source.
I believe it‘s done with via signed exchange. You are free to host it where ever i think.

https://amp.dev/documentation/guides-and-tutorials/optimize-...