You've forgotten that security includes availability, in addition to confidentiality and integrity. Interesting design choice for the entity which runs the emergency broadcasting system.
You're saying you'd prefer for e.g. NOAA to not be able to issue tornado warnings in order to ensure nobody can fake a tornado warning.
I think what konklone was getting at is that any scenario that allows an attacker to trigger a certificate warning (and effectively taking down the service) would also allow them to take down the service through other means. Do you have a scenario in mind that doesn't require either a MitM (who could just as well block the service) or a compromised client/server (which would allow the attacker to block access either way)?
This response implies that it the reduced available need be malicious, it could be non-intentional as well.
ex:
During an emergency I connect to public wifi because mine is not working. That wifi has a MITM proxy installed by the owner (because they want to server ads over https, it's a developer's wifi and they were testing with something like charles proxy, etc). This page is now unavailable during an emergency. Thus lack of availability without malicious intent.
The general assumption for HSTS is that, in all cases, it's better to be unavailable than have the possibility of compromise. I'm unsure if that's the case for critical services in times of need.
Well, it doesn't have to be an attack per se. Maybe the client's clock is wrong, which actually happens a lot. Or admin error replacing the cert on the server. There are of course lots of ways admin error can take down a server, but https adds some fun possibilities that are easier to trigger and harder to recover from.
Bonus points for client clock error. If I had a nickel for every time...
The best is when it's a timezone issue and the distant end responds with "I have 0 drift, must be a problem on your end". Crypto is hard, time is hard. Crypto which relies on time...
I mean, sure, there's more things that can go wrong once you add TLS to the stack. At the same time, there are so many other guns to shoot yourself in the foot with, so why is that we should draw the complexity trade-off line between HTTP and HTTPS? HTTPS seems to be good enough for 50% of all page loads nowadays. There's no active attack scenario here (which I agree would be a concern for critical services!), and for every possible TLS server or client issue, there are a multitude of other server, network or browser issues that could have a similar effect.
The point of this thread has been that adding additional complexity, whatever its form, makes services more fragile. You might not be aware of this, but there was recently a Treasury CA delegated from the Federal Common Policy root CA whose cert expired. This caused every system downstream to have to go through and update their CA bundles. There was significant pain because systems with hsts enabled trying to connect to web services with the wrong cert bundle caused exactly the type of outage we've been discussing. This is not a hypothetical, there were systems with days/weeks of downtime caused by (mostly) human error. The fact that other things can go wrong too does not mean that things going wrong because of HTTPS isn't a problem. It's a trade-off, like everything in security.
Managing certs is work. People get it wrong sometimes. Mandatory hsts means no "just click allow" safety net. This decision takes away the ability to accept that risk for systems where availability is more important.
If I screw up max connections or keep alive or some such in nginx.conf I can revert that change with downtime limited to the duration of the bad change. Screw up HPKP with a bad cert roll and you can't just revert. Users will be bifurcated into before and after groups, and you can't fix that without waiting it out.
Oh, HPKP is definitely something you'll want to think about hard before committing to. Getting a publicly-trusted certificate from any of the myriad of CAs out there, on the other hand, is no rocket science.
You might want to re-read my post more carefully, there is not necessarily an attacker per-se in an availability incident (although there certainly could be. Depends on how evil one wants to think.).
Backhoe eats the fiber to the ocsp responder and CRL distribution point, CRLs timeout after 24 hours.
> You might want to re-read my post more carefully, there is not necessarily an attacker per-se in an availability incident (although there certainly could be. Depends on how evil one wants to think.).
Well, that was the context of this thread. Both the OP and konklone are talking about attack surface. If you want to talk about how running a service via TLS and using HSTS makes HA harder, that's a different discussion.
> Backhoe eats the fiber to the ocsp responder and CRL distribution point, CRLs timeout after 24 hours.
OCSP and CRL is soft-fail by default in all browser I'm aware of. The server is also in control of it via OCSP Stapling, so it has all the tools it needs to keep the server available, assuming proper configuration and monitoring (which is true for a HTTP service as well).
Is the backhoe/squirrel/hurricane an attacker thus making this an "attack"? Semantics. Availability is part of the attack _surface_, which if we're being pedantic is what was being discussed. ("Look at the shiny new attack surface!")
> different discussion
My point is that, no, it's not. The three points of the triad are inextricably linked. More C and/or I means less A (and A tends to be sidelined in favor of C and I these days).
> OCSP and CRL is soft-fail by default in all browser I'm aware of.
Not on government systems they aren't (STIG id: v-44789). Also, if we're going all in on https we should go all in on https.
> ... Stapling
How is the server supposed to get a response to staple if the responder is unavailable?
Also, time. Also, client root of trust. Also, fat-fingering the hostname when the DNS gets updated. Also, public wifi which does mitm...
Bottom line: this is a decision which prioritizes confidentiality and integrity over availability for the entire .gov with (seemingly) no recourse.
To quote my comment from above - bear in mind that when it comes to plain HTTP, it's not just the system's confidentiality and integrity that you need to weigh against availability: it's the user's confidentiality and integrity.
That's a larger moral responsibility, in my opinion. And consider that the fallback to prioritize availability in case of a non-attack cert error (e.g. revocation or expiration) is to ask the user to look at a certificate warning and make a personal trust decision about it. There are precious few users who can safely make that kind of a decision. And even if they "get it right" that time and click through and aren't attacked, you're training users to click through warnings, and helping them subject themselves to attacks in the future.
I would argue that that kind of "availability" is a very weak sort of availability. The government has enough problems with training people to click through certificate warnings (see: https://www.iad.gov) -- intentionally leaving that hole open seems unwise.
> Not on government systems they aren't (STIG id: v-44789). Also, if we're going all in on https we should go all in on https.
I found this description: "By setting this policy to true, the previous behavior is restored and online OCSP/CRL checks will be performed. If the policy is not set, or is set to false, then Chrome will not perform online revocation checks. [...]"
This seems to address the fact that Chrome does not perform OCSP queries at all, instead relying on its CRLSets. However, even back when Chrome did OCSP queries, it was soft-fail (as is every other browser). The "previous behavior" would thus be to query OCSP, but fail silently anyway.
> How is the server supposed to get a response to staple if the responder is unavailable?
OCSP responses from publicly-trusted CAs are typically valid for 10 days, and they're updated at least once every 4 days (IIRC). That'll leave 6 days for the responder to come back online in the worst case (or 6 days to tell everyone about "badidea" in case the CA is nuked from orbit, along with any other publicly-trusted CA the site might switch to). (Let's not forget it's soft-fail, so this is just a theoretical exercise).
> Bottom line: this is a decision which prioritizes confidentiality and integrity over availability for the entire .gov with (seemingly) no recourse.
I'll give you that. I just don't think the availability concerns are bad enough to outweigh the benefits, and they can be mitigated in just about any scenario.
You're saying you'd prefer for e.g. NOAA to not be able to issue tornado warnings in order to ensure nobody can fake a tornado warning.