Hacker News new | ask | show | jobs
by duckharp 5079 days ago
But what about preferences for anonymous users? Store that on the server side? Append them to the URL? Both options kinda suck.

Also, consider dabblet. The way it allows you to store your stuff using github is very smart IMHO.

1 comments

Store it on the server: The user-agent gives you a session-id to use as key.

It may be that session-keys should tell if they are anonymous or if they represent (locally) authenticated users, but that's a very complex subject I won't claim to have a clear opinion of yet.

Store the settings of anyone who ever connected? For how long? Forever, just in case? Silly. And why do you even assume the server has to have a database? Why should it be required to have one, why should it have to store the stuff? What is your take on statelessness? you concentrate so much on the abuses of cookies and client side storage/computation, but you're not addressing the advantages. I doubt you're aware of them to be honest.
Uhm, isn't that how it works today ? Do you care about how many metric shitloads of storage your cookies take up on client's disks ? Shouldn't you ?

Putting the cost of storage where the decision to store is made is sound economic practics.

My Cookies directory is 11 MB. That's actually quite a lot, considering the length of the average cookie, but my disk is 256 GB and it's only gotten that big because I've been browsing for years literally without ever clearing my cookies and I can clear them at any time.

This is really a non-issue.

It is the user's session data. If it is stored on their end, they can choose how long they wish to store it for, and delete it any time they like.
User-agent, and other bits of stuff that is duly noted by http://panopticlick.eff.org/ are much-much-much worse than cookies. Cookies you can erase. User-agent and other "fingerprints" are with you forever. And they travel with you no matter where you are.

So, while you would dismiss the "privacy hazard" that the cookies are, you replace it with something much worse.

You can still have the cookie concept, and have the session id be a random number each time someone sends a tab to the site. The cookie can hold those preferences, and the session id can be used for session stuff. As a bonus, you can then only load the cookie on the first page load, and keep the values in cache associated with the browser random session number, saving in data transfer issues, and losing nothing. And for those that don't need cookies, they get a big win in terms of privacy.
ok, so I grpk the idea correctly it is something like "send the cookie-like-data from the client only on the first GET, if you are doing it over HTTP/1.1 single TCP connection" - that sense (and could be easily made into an extension to HTTP/1.1 - [though it creates the dependency between the different GET requests] - have the server will just send "X-Dont-Send-Me-More-Cookies-in-this-TCP: yes!" header from the server, and make the compliant clients react to it).

What I do not understand where's the win on the privacy front here. You send the random ids - but the site owner will re-correlate these random IDs with your identity. So, you would not win anything here - or, what am I missing ?

My take on the privacy:

There is no problem with someone collecting a bunch of info about me and using it to improve their services.

There is a little bit of a problem with someone collecting a bunch of info about me and another million people and keeping that in a big blob.

There is a big problem when that someone gets hacked and this bunch of info about another million people gets to the bad kids.

It's the centralization of a lot of data that is bad for the privacy.

Store the data locally on the clients and give it to the server only when it is contextually needed. e.g.: my shipping address, I am happy for my browser to supply it to you from my local storage to you every time you want to ship me something. I am very happy if you do not store and sell this address to someone who will later send snail-mail spam to me. Or store without the due diligence ('cos time to market and all that) and then get hacked and then I find myself "having paid" for the helicopter spare parts.

Of course, this would hurt the nouveau business models that treat the users as a product. And will make the analytics harder - because one would not be able to just run a select... But to me it could be a useful tradeoff.

(above, I use the term "client" to refer to the collective set of the devices that are "mine". As I wrote in another reply, storing the state on client does not imply the difference in the user-seen behavior, so the shopping cart should survive).

Of course, keeping the data decentralized on your computer is super secure, this is why botnets logging users data never got beyond theory. It is also why phishing was a clever idea but never panned out, people only would send data to the right recipients. </snark>

Sure, centralized data sounds big and scary, because a single security instance looses a million people's data in one go, but how is it any different from a million security instances in a virus losing "only" 1 person's data?

Similarly, I don't understand how it is remotely feasible to think that storing your shipping adress on your computer vs on a site that is shipping you stuff changes things -- I mean, they still have to get your address to send you the stuff you ordered. It is a fundamental requirement of shipping. Address is not a private bit of info.

Fingerprinting will be around, so it is probable that there will still be tracking. Can't beat that right now, so lets not conflate that with other problems. Instead lets look at the problems that are solved: cookies store data to make it easy to not just correlate and be probably right about the user, but be perfect. Further, they can be hijacked and otherwise stolen and used by malicious third parties, giving data beyond just the access patterns to the site in question. Session ids can be engineered to not have this inherent problem, cutting down information leakage. Further, I imagine plugins that will keep drack of your worst data offenders, and force a new session id every request from them, making the data tracking and correlation even more difficult.

It isn't an all or nothing game, even if you get rid of the low-hanging-fruit abuses, it is a win. Yes, new stuff will come along, but that doesn't mean we shouldn't try, particularly when the current scenarios allow all the bad stuff you can think of, but easier.

re. snark: phishing: it is not the physical user that has to input the data. Think of how you use the password manager. botnets: yes, but since I keep my computing devices clean, I was never a victim of a botnet. While my account info was stolen from one of the online sites, with zero influence. See where the difference is ?

The difference is that the decentralized approach would put more control in the hands of the user (so they either take care themselves or hire someone to take care for them). If they want to.

"Address is not a private bit of info" - it's person and context dependent. Some people consider their name a private bit of info in some contexts... And yes you have to send the shipping info to the remote party to ship you stuff. But they do not have to keep it neatly packed one select away.

I still have a difficulty understanding how the "random session-id" will solve the problem of privacy. All I can see happening is one more level of indirection, that will cause the creation of the frameworks to re-collate this back. Because this is a functionality that is needed by the developers. And once you have the commonly available code, you're back to previous stage - except with an additional pile of code to debug.

I'm not saying all of this because I think we should stop trying. It's just that I can't see how the cost of uplifting the entire internet infra (the code required for this functionality will surely be much more storage than the cookies over my lifetime) and the cost of having the programmers support both models for the good chunk of future (hello, IE6 users, I am looking at you! :-) justifies the incremental feeling of security that this gives.

edit: re. sending the data to the trusted server: sign with your client key a "request for data" together with the manifest of the addresses that the server can plausibly have. Then when the server needs the data it can present this request to your UA and get the data. Yes, the server can be hacked and this data can be siphoned off. But then the attackers get the [timespan of the breach] worth of user data, and not the entire DB.