Hacker News new | ask | show | jobs
by comex 5080 days ago
Removing cookies from a protocol which is otherwise fully compatible with HTTP/1, in the sense of being able to be interposed as a proxy or substituted in the web server without breaking apps, is a terrible idea.

> Cookies are, as the EU commision correctly noted, fundamentally flawed, because they store potentially sensitive information on whatever computer the user happens to use, and as a result of various abuses and incompetences, EU felt compelled to legislate a "notice and announce" policy for HTTP-cookies.

> But it doesn't stop there: The information stored in cookies have potentialiiy very high value for the HTTP server, and because the server has no control over the integrity of the storage, we are now seing cookies being crypto-signed, to prevent forgeries.

Anyone with a grain of skill is capable of using cookies as identifiers only; it's hard to see what cookies vs identifiers has to do with "notice and announce" or security. An explicit session mechanism could provide benefits over using cookies for the same purpose, but what exactly would removing cookies achieve other than breaking the world?

3 comments

I agree that anybody with sufficient clue can and will use cookies as id only.

Unfortunately such people are evidently few and far between.

Banning cookies and having the client offer a session identifier instead solves many problems.

For starters, it stores the data where it belongs: On the server, putting the cost of storage and protection where it belongs too.

This is a win for privacy, as you will know if you have ever taken the time to actually examine the cookies on your own machine.

Second, it allows the client+user to decide if it will issue anonymous (ie: ever-changing) session identifiers, as a public PC in a library should do, or issue a stable user-specific session-id, to get the convenience of being recognized by the server without constant re-authorization.

Today users don't have that choice, since they have no realistic way of knowing which cookies belongs to a particular website due to 3rd-party cookies and image-domain splitting etc.

Network-wise, we eliminate a lot of bytes to send and receive.

One of the major improvements SPDY has shown is getting the entire request into one packet (by deflating all the headers).

But the only reason HTTP requests don't fit in a single packet to begin with is cookies, get rid of cookies, and almost all requests fit inside the first MTU.

Finally, eliminating cookies improve caching opportunities, which will help both client and server side get a better web experience.

As for breaking the world: It won't happen.

It is trivial to write a module for apache which simulates cookies for old HTTP/1 web-apps: Simply store/look up the cookies in a local database table, indexed by the session-id the client provided.

I'm sure sysadmins will have concerns about the size of that table, but that is an improvement, today the cost is borne by the web-users.

> This is a win for privacy, as you will know if you have ever taken the time to actually examine the cookies on your own machine.

Most of the cookies I've seen are some kind of hash.

> Second, it allows the client+user to decide if it will issue anonymous (ie: ever-changing) session identifiers, as a public PC in a library should do, or issue a stable user-specific session-id, to get the convenience of being recognized by the server without constant re-authorization.

> Today users don't have that choice, since they have no realistic way of knowing which cookies belongs to a particular website due to 3rd-party cookies and image-domain splitting etc.

I don't see how this makes sense - what's the difference?

Assuming that the session identifier is different between sites (if it's not, then the user has no option to "remove cookies" for a single domain without deauthenticating everywhere, and it's harder to determine which sites are tracking you):

- There will still be third party domains involved, since advertisers will still want to correlate traffic between domains;

- Sending a new session identifier with every request won't be practical, because you won't be able to log in, but users will be able to set their browsers to send a new identifier when the window is closed or whatever... just as they could currently configure their browser to clear cookies at that time.

Also, anyone who wants to abuse cookies can just use localStorage.

> But the only reason HTTP requests don't fit in a single packet to begin with is cookies, get rid of cookies, and almost all requests fit inside the first MTU.

Surely it's still useful to deflate things (user-agent...), though, and then what does it matter?

> Finally, eliminating cookies improve caching opportunities, which will help both client and server side get a better web experience.

How so? The server is perfectly justified in sending different content based on the session identifier, so wouldn't a proxy have to assume it would?

But if you want to say the result doesn't depend on cookies, can't you just set a Vary header?

> It is trivial to write a module for apache which simulates cookies for old HTTP/1 web-apps: Simply store/look up the cookies in a local database table, indexed by the session-id the client provided.

Eh... okay. This still breaks anything that uses JavaScript to interact with the cookies.

The client/user-agent gets to control what session-id gets sent to which sites.

That makes it possible for a UI design where the user can press a button and say "don't surf this site anonymously" with the default being a new random session-id for all other sites.

That will make tracking and correlation of webusage much harder, which I really don't see a downside to.

Deflate is bad on its own, it is a DoS amplifier and it makes the job of load-balancers much more resource intensive, because they have to retain compression state for all connections and spend CPU and memory on the inflation.

The server is perfectly justified in customizing content, and we have a header for saying that is the case: Cache-Control.

The problem with cookies is that they disable caching of everything on the site, including favicon.ico and there is nothing the server can do about it, because the cookies are sent on all requests.

Javascript will also have access to the session-id.

> That makes it possible for a UI design where the user can press a button and say "don't surf this site anonymously" with the default being a new random session-id for all other sites.

This is already possible, just give tabs their own cookie context by default. (Browsers don't make this the default, but they all have some variant of "incognito mode" already...)

> The problem with cookies is that they disable caching of everything on the site, including favicon.ico and there is nothing the server can do about it, because the cookies are sent on all requests.

I admit that I don't know much about HTTP caching, but I don't see why the Cookie header would inhibit caching. (Edit: Isn't the purpose of the Vary header to specify which request headers affected the result, including Cookie?)

(probably pressing wrong reply link here ?)

Cookies are almost never mentioned in Vary: so all caches have to assume that the precense of cookies means non-cacheable.

Hmm. That would indeed be an advantage of a new mechanism.

What I meant about JavaScript is that a server side cookie->session bridge for legacy code would not work in general because corresponding client side code sometimes expects to be able to see the cookies in document.cookie.

An identifier has privacy disadvantages over a cookie with the same duration. The least privacy you have is when the server has a unique identifier for you: then they can do whatever they want. With a cookie the site has an option to store only what they need, instead of something unique. For example if I'm running an a/b test I could do this with a cookie, setting it to "1" for half the users and "2" for the other half.

(I work on mod_pagespeed, and our experimental framework uses cookies this way: https://developers.google.com/speed/docs/mod_pagespeed/modul...)

Why must HTTP 2.x be backwards compatible with 1.x? Should SSH have not been created because you can't talk to it with a telnet client? If a new protocol offers sufficient benefits, it would be worth having to make minor changes to apps to support both.

Cookies suck, from a technical and regulatory-compliance standpoint. Plus, I'll finally stop having to clear my cookies every month or so just to log in to my PayPal and American Express accounts. Both sites keep creating unique cookies on every login until there are so many that they pass their own web servers' max header length limits.

> Why must HTTP 2.x be backwards compatible with 1.x?

Because the only benefit of removing cookies is a tiny bit of simplicity which could theoretically allow removing (a small amount of) code browsers will already have to keep around for probably at least a decade to support existing websites. If cookies are mostly unused by the time HTTP/3.x rolls around, we can talk...

> Cookies suck, from a technical

Agreed, but...

> and regulatory-compliance standpoint.

I don't understand this point. Surely the need for regulation of user tracking by websites doesn't depend on whether cookies or an equivalent mechanism are being used? If people start using Not Cookies(tm), they will be unregulated at first, but the law will be changed if the effect is the same.

Edit: Similarly, any protocol that gives a website a persistent identity token without its explicitly requesting one is a bad idea - cookies do provide a modicum of visibility to the user regarding who's tracking them. Not sure exactly what Kamp is proposing.

> Plus, I'll finally stop having to clear my cookies every month or so just to log in to my PayPal and American Express accounts. Both sites keep creating unique cookies on every login until there are so many that they pass their own web servers' max header length limits.

Hah, no you won't. I strongly suspect legacy codebases will remain on HTTP/1.1 approximately forever, at least if 2.0 is backwards incompatible.

If you would bother to read the full thread as well as PHK's position proposal you'd see that removing cookies brings more than a tiny benefit. The overall flavor of his proposal is to reify a concept of session and by that eliminate redundant communication. At the same time we get substantial wins in security.
You can add a concept of session without removing cookies, which are trivial to keep supporting since, as I mentioned, browsers will have to support cookies for legacy sites for a long time anyway.

I doubt the security argument amounts to much, considering that there are few sites with cookie-based vulnerabilities, it's long been trivially easy ($_SESSION in PHP) for any site to use identifiers as cookies, and many of the sites that are vulnerable are the kind of old-fashioned things that will never be upgraded anyway.

Once you have a concept of a session unique nonce, cookies are needless. Browsers can implement a locally encrypted resumption store where the user entered entropy never touches the network to resume a session on the same machine. Resuming a session on a new machine could use 2 factor auth with fallback one time capabilities. That's big, doubly so given the clear historical trends in users ability to memorize entropy and the likelihood of hashed password database disclosure.

What you need to understand, is that we can get rid of cookies, live in a more secure world, and give up nothing. The only thing holding us back is unwillingness to understand the underlying issues and fear that we stand to lose something by advocating change.

On top of that, by standardizing on a nonce we avoid all cookie request overhead larger than the nonce, which is not trivial. Every mandatory request byte we save under MTU is huge.

There is no difference between browsers implementing a locally encrypted resumption store for a nonce and for cookies (since again, most sites where security is important already use cookies purely as identifiers); nor does it affect whether sites will start requiring two factor for all logins. The two systems are equivalent except that one is simpler, but the (small amount of) complexity of cookies is not what's blocking these types of security measures.

I'm not saying additional login security isn't a good idea (although since local cookies tend to be compromised by malware running on the machine rather than offline attacks, it may not be that useful to encrypt them), and I'm not saying that avoiding cookies isn't a good idea, because cookies do tend to get bloated. But tying the two proposals together is unhelpful to both, because they're essentially orthogonal.

As for avoiding cookie request overhead, that is again something you can do by adding a standard nonce without actually removing the old mechanism; sites that want to be fast or have a standardized way to interact with HTTP routers, and most sites that use web frameworks, as the frameworks get updated, will use the new one. The only way removing cookies would help is if servers started translating cookies for legacy applications automatically, but I don't see that becoming prevalent because of document.cookie and related concerns.

edit: and again, breaking backwards compatibility is a great way to slow HTTP/2 adoption, not that it really matters unless it brings TLS to all sites along with it (but that's another story...)

> and regulatory-compliance standpoint

One of the main reasons why people can't just turn off Cookies is because they are needed for session management. This makes it very difficult to just disable. If there was a dedicated session management method in HTTP/2.0 then that would remove a lot of the need for Cookies. Then they could be used for what they were intended (local persistent state). This would also give users better methods for managing them (or just disabling them).

Eh... maybe eventually, once nothing uses cookies anymore (including existing HTTP sites). But surely this can be solved today by having browsers force cookies to expire with the session?
And that is what phk is advocating as part of his proposal. Cookies are the wrong tool to be used for session management.
you probably know this, but cookies are sent with every damn request the client makes. So it's a tax, It may be a small % of total traffic, but most of the time, it's useless.
It's easy to work around this in a backwards compatible way, though, in a way that's applicable to more than just cookies - in requests after the first, only re-send headers that have changed.

Or just live with the hit while sites migrate to the new mechanism; I'm fine with it being considered a legacy thing.

HTTP/2.0 doesn't have to be backwards compatible at all. In fact, I see the future protocol switch being pretty simple. There will be new HTTP/2.0 servers and HTTP/1.1 legacy servers. The clients will speak either language, but 2.0 servers will be faster. Eventually clients will let the user say if they want to talk to 1.1 servers at all.

The initial line will remain the same, except for the version:

    GET /page HTTP/2.0
    *** extra 2.0 headers/request ***
If the server speaks 2.0, it will just carry on. If it doesn't, the server will return a 505 and the client will resubmit the request:

    GET /page HTTP/2.0
    505 HTTP Version Not Supported
    GET /page HTTP/1.1
    *** 1.1 headers / request ***
There is no reason the protocols must to be backwards compatible past the first line. Hell, 2.0 could even be binary after that first line. So, while they don't have to be compatible, they can still coexist.
It's not about "server speaking HTTP/2.0". Its about "application speaking HTTP/2.0".

Every web app today is married with $COOKIE statements. The whole problem for me as a web developer is that existing applications will have to be rewritten just to run on top of a cookie-less HTTP/2.0 protocol.

SPDY, despite all its architectural problems, has taken off in a huge way (even Facebook is implementing it) just because applications don't have to be rewritten. You could just install mod_spdy in apache and be done with it. From the view of the web developer, these kind of breaking changes just makes life more painful.

Of course this about the server speaking HTTP/2.0. We are talking about a potential client-server protocol upgrade. The fact that millions of applications are written to the 1.1 spec shouldn't be a factor in working on the 2.0 spec.

I believe that HTTP/2.0 will necessarily require applications to be rewritten. Or at least frameworks will need to be updated in order to take advantage of the new features. And you can expect some pain when other features are taken away. If there is a push to make cookies more 'optional', you can expect that clients will start to let users block cookies entirely. Would you rather your application kinda work for people but not those on 2.0 browsers? No, of course not, you'd rather it work for everyone. So why try to run a 1.1 webapp across the 2.0 protocol.

If you are hard-coding $COOKIE statements in your code, you aren't writing it to a sufficient abstraction to be able to survive a future major version jump. But there's nothing wrong with that. Major version jumps in a protocol are pretty rare, and your code will still work just fine as a 1.1 webapp.

If you're writing applications that expect to be dealing with HTTP requests, then of course you'll have to rewrite applications to run on a major version upgrade of a protocol. This is what will be expected. Major version updates shouldn't necessarily be backwards compatible, and that's the main argument of the post. If the update is marginal, there will be nothing to drive adoption of 2.0 over 1.1.

What I was trying to point out was that, (hypothetically) if HTTP/2.0 isn't backwards compatible, that doesn't mean that HTTP/1.1 and 2.0 applications couldn't co-exist on the same site (or even the same server).

Plus, let's remember, this is all hypothetical - we are still trying to figure out the goals of HTTP/2.0.

An important consideration is that forcing the first line on the HTTP2.0 server can actually really hurt it. I mostly agree with Kamp's considerations on HTTP routers, and not having all data necessary for routing at fixed offsets in the packet instantly makes their jobs harder.

The very least, you want the server-provided identity header to be before all the variable-length fields, because in normal situations, most high-throughput servers will be able to fully route their traffic on it alone.

No, I agree that HTTP doesn't make things easier on HTTP routers. Variable length headers that have to be parsed does nothing for speed.

However, this is the method that HTTP has defined for version upgrades, so if you want to muck around with the first line, you lose the ability to co-exist with HTTP/1.1 on the same port.

And I really doubt that they'll want to switch ports for HTTP/2.0.

One possible method would be forcing width padding of the request-path and Host headers. This would potentially make it possible to use fixed offsets. But this strikes me as inelegant.

I dunno - obviously different projects diverge on required backwards compatibility (Python 2/3?), but SSH isn't called Telnet v2.

For something as fundamental as HTTP the author argues changes need to be radical to drive adoption, but at the same time there's not necessarily wide-spread impetus to do so if the burden is too high. This is engineering on a 15 year time-scale, which I feel a little young (at 24) to well comprehend!

It's not just apps in the web-app sense, but user agents (all the way down to embedded systems) that would need changing to take advantage of the 'sufficient benefits'. That's a pretty massive undertaking.

HTTP 1.x will not disappear just because 2.x is available. Millions of people still run wifi networks on 802.11b; all those routers still keep on workin' even though 802.11a/g/n had 'sufficient benefits' that we build those into new hardware. There was no massive undertaking to replace all the routers in the world.

Nobody would have to change their old apps or hardware. Like SPDY, the availability of a new protocol supported by web browsers just means new stuff can optionally do things old stuff can't.

It doesn't seem like the proposals are "fully" compatible with HTTP. Some of them are entirely different encodings. And I doubt people are thinking of carrying over comments in headers and line folding...

What actually is the proposal to eliminate cookies? Just provide some fixed "identifier" type field?

> And I doubt people are thinking of carrying over comments in headers and line folding...

Heh, I don't think any reasonable web apps actually depend on the value of those :)

> What actually is the proposal to eliminate cookies? Just provide some fixed "identifier" type field?

Unfortunately, I don't think there is a concrete proposal to compare to, other than

    Given how almost universal the "session" concept on the Internet we
    should add it to the HTTP/2.0 standard, and make it available for
    HTTP routers to use as a "flow-label" for routing.
I'd say the cookies situation could be solved by one-page apps. Because the software persists across page-views, it'll be able to maintain session using some other mechanism (localstorage, the lifetime of the tab, whatever). The user could then disable cookies and not be worried about tracking.
so your "solution" for cookies, is to mandate that all web apps must use ajax, and pass session IDs as part of the URL for all requests?